☆ 4.6 Article

A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments : A More Objective Means of Data Exclusion

SYSTEMATIC BIOLOGY (2009)

期刊

SYSTEMATIC BIOLOGY

卷 58, 期 1, 页码 21-34

出版社

OXFORD UNIV PRESS

DOI: 10.1093/sysbio/syp006

关键词

Alignment ambiguity; alignment quality; ALISCORE; resampling; scoring; substitutional saturation

类别

Evolutionary Biology

资金

Deutsche Forschungsgemeinschaft (DFG) [MI 649/7-1]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Random similarity of sequences or sequence sections can impede phylogenetic analyses or the identification of gene homologies. Additionally, randomly similar sequences or ambiguously aligned sequence sections can negatively interfere with the estimation of substitution model parameters. Phylogenomic studies have shown that biases in model estimation and tree reconstructions do not disappear even with large data sets. In fact, these biases can become pronounced with more data. It is therefore important to identify possible random similarity within sequence alignments in advance of model estimation and tree reconstructions. Different approaches have been already suggested to identify and treat problematic alignment sections. We propose an alternative method that can identify random similarity within multiple sequence alignments (MSAs) based on Monte Carlo resampling within a sliding window. The method infers similarity profiles from pairwise sequence comparisons and subsequently calculates a consensus profile. This consensus profile represents a summary of all calculated single similarity profiles. In consequence, consensus profiles identify dominating patterns of nonrandom similarity or randomness within sections of MSAs. We show that the approach clearly identifies randomness in simulated and real data. After the exclusion of putative random sections, node support drastically improves in tree reconstructions of both data. It thus appears to be a powerful tool to identify possible biases of tree reconstructions or gene identification. The method is currently restricted to nucleotide data but will be extended to protein data in the near future.

A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments : A More Objective Means of Data Exclusion

期刊

SYSTEMATIC BIOLOGY

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments : A More Objective Means of Data Exclusion

期刊

SYSTEMATIC BIOLOGY

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文