4.7 Article

Assessing the performance of DNA barcoding using posterior predictive simulations

Journal

MOLECULAR ECOLOGY
Volume 25, Issue 9, Pages 1944-1957

Publisher

WILEY
DOI: 10.1111/mec.13590

Keywords

biodiversity; clustering algorithms; genetic distances; model adequacy; operational taxonomic units; substitution models

Funding

  1. Arnold O. Beckman Postdoctoral Fellowship
  2. NSF [DBI 1356796, DEB 1354506]
  3. Direct For Biological Sciences
  4. Division Of Environmental Biology [1354506] Funding Source: National Science Foundation
  5. Direct For Biological Sciences
  6. Div Of Biological Infrastructure [1356796] Funding Source: National Science Foundation
  7. Office of Advanced Cyberinfrastructure (OAC)
  8. Direct For Computer & Info Scie & Enginr [1341935] Funding Source: National Science Foundation

Ask authors/readers for more resources

Accurate estimates of biodiversity are required for research in a broad array of biological subdisciplines including ecology, evolution, systematics, conservation and biodiversity science. The use of statistical models and genetic data, particularly DNA barcoding, has been suggested as an important tool for remedying the large gaps in our current understanding of biodiversity. However, the reliability of biodiversity estimates obtained using these approaches depends on how well the statistical models that are used describe the evolutionary process underlying the genetic data. In this study, we utilize data from the Barcode of Life Database and posterior predictive simulations to assess the performance of DNA barcoding under commonly used substitution models. We demonstrate that the success of DNA barcoding varies widely across DNA substitution models and that model choice has a substantial impact on the number of operational taxonomic units identified (changing results by similar to 4-31%). Additionally, we demonstrate that the widely followed practice of a priori assuming the Kimura 2-parameter model for DNA barcoding is statistically unjustified and should be avoided. Using both data-based and inference-based test statistics, we detect variation in model performance across taxonomic groups, clustering algorithms, genetic divergence thresholds and substitution models. Taken together, these results illustrate the importance of considering both model selection and model adequacy in studies quantifying biodiversity.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available