4.6 Article

Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation

期刊

SYSTEMATIC BIOLOGY
卷 67, 期 2, 页码 216-235

出版社

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syx068

关键词

Long-branch attraction; long-branch repulsion; maximum likelihood; mixture model; posterior mean site frequency; site heterogeneity

资金

  1. Natural Sciences and Engineering Research Council of Canada
  2. Canadian Institute for Advanced Research Program in Integrated Microbial Diversity
  3. Canada Research Chairs program
  4. Austrian Science Fund [FWF I-2805-B29]
  5. Austrian Science Fund (FWF) [I2805] Funding Source: Austrian Science Fund (FWF)

向作者/读者索取更多资源

Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with k classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately k/1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据