☆ 4.7 Article

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2009)

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

卷 6, 期 1, 页码 7-21

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2008.63

关键词

Markov processes; biology and genetics

类别

Biochemical Research Methods Computer Science, Interdisciplinary Applications Mathematics, Interdisciplinary Applications Statistics & Probability

资金

Sir Isaac Newton Institute for Mathematical Sciences
US National Science Foundation [DEB-07330209, ITR-0331453, ITR-0121680, ITR-0114387, EIA-0303609]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The current technique for estimating phylogenies from sequence data uses two phases: first, the sequences are aligned, and then the tree is estimated using the obtained alignment. More recently, however, several computational methods have been developed for simultaneous estimation of the alignment and the tree, of which POY ( a heuristic for the NP-hard minimum treelength problem, which extends maximum parsimony ( MP) so that gaps contribute to the cost) is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to the very simple two-phase method of estimating the alignment using ClustalW and then analyzing the resultant alignment using MP. They found that in the overwhelming majority of the cases, ClustalW + MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques ( collectively referred to as Direct Optimization) are not competitive with two-phase techniques. Our paper presents a simulation study in which we take a closer look at the points raised by Ogden and Rosenberg. Instead of focusing specifically on POY, we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Since this optimization depends upon the specific edit distance criterion used to score a tree, our study considers the impact of the gap penalty ( in particular, affine versus simple) on the accuracy of the resultant alignment and tree that optimizes the treelength for that gap penalty function. Our study suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs, but also suggests the intriguing possibility that optimizing under an affine gap penalty might produce alignments that are not only better than ClustalW alignments, but competitive with ( or perhaps better than) those produced by the best current alignment methods. This study also shows that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW _ MP, and competitive with the current best two-phase methods.

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文