4.6 Article

Species Tree Estimation from Gene Trees by Minimizing Deep Coalescence and Maximizing Quartet Consistency: A Comparative Study and the Presence of Pseudo Species Tree Terraces

Journal

SYSTEMATIC BIOLOGY
Volume 70, Issue 6, Pages 1213-1231

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syab026

Keywords

Gene tree; incomplete lineage sorting; phylogenomic analysis; species tree; summary method

Funding

  1. Information and Communication Technology Division (ICT Division), Government of the People's Republic of Bangladesh

Ask authors/readers for more resources

Estimating species trees from multilocus data sets is challenging due to gene tree heterogeneity caused by incomplete lineage sorting. Summary methods combine gene trees to estimate a species tree by optimizing various scores. This study explores the presence and impacts of equally optimal trees in species tree estimation using methods that consider incomplete lineage sorting. The experiment indicates that one method, MDC, may result in competitive quartet consistency scores but worse tree accuracy compared to another method, MQC, demonstrating the importance of considering equally optimal species trees in phylogenomic inference using summary methods.
Species tree estimation from multilocus data sets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by summarizing a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multilocus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of data sets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy-demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the data sets to have equally optimal trees in the context of phylogenomic inference using summary methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available