4.6 Article

Phylogeny Estimation Given Sequence Length Heterogeneity

期刊

SYSTEMATIC BIOLOGY
卷 70, 期 2, 页码 268-282

出版社

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syaa058

关键词

Phylogeny estimation; sequence length heterogeneity; phylogenetic placement

资金

  1. US National Science Foundation [ABI-1458652, 1513629]
  2. Div Of Information & Intelligent Systems
  3. Direct For Computer & Info Scie & Enginr [1513629] Funding Source: National Science Foundation

向作者/读者索取更多资源

The study compared two basic approaches for estimating trees on large datasets, finding that using UPP to align sequences and RAxML to compute a tree on the alignment provided the best accuracy, outperforming trees computed using phylogenetic placement methods, particularly in cases of substantial sequence length heterogeneity and high rates of evolution. Additionally, it was found that FastTree had poor accuracy on alignments containing fragmentary sequences.
Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We compare two basic approaches: (1) computing an alignment on the full dataset and then computing a maximum likelihood tree on the alignment, or (2) constructing an alignment and tree on the full length sequences and then using phylogenetic placement to add the remaining sequences (which will generally be fragmentary) into the tree. We explore these two approaches on a range of simulated datasets, each with 1000 sequences and varying in rates of evolution, and two biological datasets. Our study shows some striking performance differences between methods, especially when there is substantial sequence length heterogeneity and high rates of evolution. We find in particular that using UPP to align sequences and RAxML to compute a tree on the alignment provides the best accuracy, substantially outperforming trees computed using phylogenetic placement methods. We also find that FastTree has poor accuracy on alignments containing fragmentary sequences. Overall, our study provides insights into the literature comparing different methods and pipelines for phylogenetic estimation, and suggests directions for future method development.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据