☆ 4.8 Article

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

MOLECULAR BIOLOGY AND EVOLUTION (2018)

期刊

MOLECULAR BIOLOGY AND EVOLUTION

卷 35, 期 2, 页码 486-503

出版社

OXFORD UNIV PRESS

DOI: 10.1093/molbev/msx302

关键词

molecular evolution; tree space; topology; heuristic search

类别

Biochemistry & Molecular Biology Evolutionary Biology Genetics & Heredity

资金

National Science Foundation [DEB-1442113, DEB-1442148]
DOE Great Lakes Bioenergy Research Center (DOE Office of Science) [BER DE-FC02-07ER64494]
USDA National Institute of Food and Agriculture (Hatch project) [1003258]
National Key Project for Basic Research of China (973 Program) [2015CB150600]
Pew Charitable Trusts
NIFA [1003258, 690581] Funding Source: Federal RePORTER
Direct For Biological Sciences
Division Of Environmental Biology [1442148] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popularmaximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that themore exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs' relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of largescale phylogenomic data analyses.

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文