☆ 4.8 Article

Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences

MOLECULAR BIOLOGY AND EVOLUTION (2009)

期刊

MOLECULAR BIOLOGY AND EVOLUTION

卷 26, 期 4, 页码 801-814

出版社

OXFORD UNIV PRESS

DOI: 10.1093/molbev/msp003

关键词

robust counting; labeled codon distance; empirical distribution; Markov chain substitution model

类别

Biochemistry & Molecular Biology Evolutionary Biology Genetics & Heredity

资金

UCLA Dissertation Year Fellowship
National Institute of General Medical Sciences Systems
Integrative Biology Training Grant
Alfred P. Sloan Research Fellowship
John Simon Guggenheim Memorial Fellowship
National Institutes of Health [R01 GM086887]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against the possibly severe bias arising from model misspecification. We achieve this robustness by generalizing the conventional distance estimation to incorporate the empirical distribution of site patterns found in the observed pairwise sequence alignment. Our flexible framework allows for computing distances based only on a subset of possible substitutions. From this, we show how to estimate labeled codon distances, such as expected numbers of synonymous or nonsynonymous substitutions. We present two simulation studies. The first compares the relative bias and variance of conventional and robust labeled nucleotide estimators. In the second simulation, we demonstrate that robust counting furnishes accurate synonymous and nonsynonymous distance estimates based only on easy-to-fit models of nucleotide substitution, bypassing the need for computationally expensive codon models. We conclude with three empirical examples. In the first two examples, we investigate the evolutionary dynamics of the influenza A hemagglutinin gene using labeled codon distances. In the final example, we demonstrate the advantages of using robust synonymous distances to alleviate the effect of convergent evolution on phylogenetic analysis of an HIV transmission network.

Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences

期刊

MOLECULAR BIOLOGY AND EVOLUTION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文