4.7 Article

A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection

Journal

BIOINFORMATICS
Volume 28, Issue 24, Pages 3248-3256

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bts580

Keywords

-

Funding

  1. European Research Council under the European Community [278433-PREDEMICS]
  2. ERC [260864]
  3. National Science Foundation [DMS 0856099]
  4. National Institutes of Health [R01 GM086887, R01 HG006139]
  5. Bioinformatics, Statistical Analysis and Evolutionary Core of the UCSD Center for AIDS Research [5P30AI36214]
  6. National Evolutionary Synthesis Center (NESCent)
  7. Division Of Mathematical Sciences
  8. Direct For Mathematical & Physical Scien [0856099] Funding Source: National Science Foundation

Ask authors/readers for more resources

MOTIVATION: Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates ( ) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates. RESULTS: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available