4.5 Article

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers

期刊

GENOME BIOLOGY
卷 18, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s13059-017-1299-7

关键词

Metagenomics; Shotgun sequencing; Taxonomy; Classification; Comparison; Ensemble methods; Metaclassification; Pathogen detection

资金

  1. Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts
  2. Starr Cancer Consortium grants [I9-A9-071]
  3. Bert L. and N. Kuggie Vallee Foundation
  4. WorldQuant Foundation
  5. Pershing Square Sohn Cancer Research Alliance
  6. NASA [NNX14AH50G, NNX17AB26G]
  7. National Institutes of Health [R25EB020393, R01AI125416, R01ES021006]
  8. National Science Foundation [1120622]
  9. Bill and Melinda Gates Foundation [OPP1151054]
  10. Alfred P. Sloan Foundation [G-2015-13964]
  11. Tri-Institutional Training Program in Computational Biology and Medicine
  12. Clinical and Translational Science Center
  13. NASA [NNX17AB26G, 1002384] Funding Source: Federal RePORTER
  14. Direct For Mathematical & Physical Scien
  15. Division Of Mathematical Sciences [1120622] Funding Source: National Science Foundation
  16. Div Of Information & Intelligent Systems
  17. Direct For Computer & Info Scie & Enginr [1526742] Funding Source: National Science Foundation
  18. Bill & Melinda Gates Foundation - Grand Challenges Explorations Initiative [OPP1151054] Funding Source: researchfish

向作者/读者索取更多资源

Background: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. Results: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. Conclusions: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据