☆ 4.7 Article

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2009)

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Volume 6, Issue 1, Pages 7-21

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2008.63

Keywords

Markov processes; biology and genetics

Funding

Sir Isaac Newton Institute for Mathematical Sciences
US National Science Foundation [DEB-07330209, ITR-0331453, ITR-0121680, ITR-0114387, EIA-0303609]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The current technique for estimating phylogenies from sequence data uses two phases: first, the sequences are aligned, and then the tree is estimated using the obtained alignment. More recently, however, several computational methods have been developed for simultaneous estimation of the alignment and the tree, of which POY ( a heuristic for the NP-hard minimum treelength problem, which extends maximum parsimony ( MP) so that gaps contribute to the cost) is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to the very simple two-phase method of estimating the alignment using ClustalW and then analyzing the resultant alignment using MP. They found that in the overwhelming majority of the cases, ClustalW + MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques ( collectively referred to as Direct Optimization) are not competitive with two-phase techniques. Our paper presents a simulation study in which we take a closer look at the points raised by Ogden and Rosenberg. Instead of focusing specifically on POY, we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Since this optimization depends upon the specific edit distance criterion used to score a tree, our study considers the impact of the gap penalty ( in particular, affine versus simple) on the accuracy of the resultant alignment and tree that optimizes the treelength for that gap penalty function. Our study suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs, but also suggests the intriguing possibility that optimizing under an affine gap penalty might produce alignments that are not only better than ClustalW alignments, but competitive with ( or perhaps better than) those produced by the best current alignment methods. This study also shows that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW _ MP, and competitive with the current best two-phase methods.

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Barking Up the Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper