4.6 Article

Dissecting Incongruence between Concatenation- and Quartet-Based Approaches in Phylogenomic Data

Journal

SYSTEMATIC BIOLOGY
Volume 70, Issue 5, Pages 997-1014

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syab011

Keywords

Conflict; gene tree; phylogenetic signal; phylogenetics; phylogenomics; Tree of Life

Funding

  1. National Natural Science Foundation of China [32071665]
  2. Thousand Youth Talents Program
  3. Howard Hughes Medical Institute
  4. National Science Foundation [DEB-1442113]
  5. Guggenheim Foundation
  6. BurroughsWellcome Fund

Ask authors/readers for more resources

Topological conflict is common in phylogenomic data with concatenation and coalescent methods. This study found that around 30-36% of genes in animal and plant phylogenomic studies exhibit inconsistency between likelihood-based and quartet-based signals, leading to a higher likelihood score but lower quartet score in one topology compared to another. Inconsistent genes are more likely to show high levels of gene tree discordance and may not accurately recover either of the conflicting topologies. Removing inconsistent genes can improve accuracy in data sets with low levels of incomlete lineage sorting and gene tree estimation error, but may not always lead to topologically identical species phylogenies in data sets with higher levels of incomlete lineage sorting and gene tree estimation error.
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict the between likelihood-based signal (quantified by the difference in gene-wise log-likelihood score or Delta GLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or Delta GQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30-36% of genes in each data matrix are inconsistent, that is, each of these genes has a higher log-likelihood score for T1 versus T2 (i.e., Delta GLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., Delta GQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that the removal of inconsistent genes from data sets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from data sets withmedium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available