☆ 4.6 Article

Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

PEERJ (2014)

期刊

PEERJ

卷 2, 期 -, 页码 -

出版社

PEERJ INC

DOI: 10.7717/peerj.545

关键词

OTU picking; Microbial ecology; Microbiome; Qiime; Bioinformatics

类别

Multidisciplinary Sciences

资金

EPA STAR Graduate Fellowship
NSF IGERT [1144807]
Arizona's Technology and Research Initiative Fund
Alfred P. Sloan Foundation [2012-5-42 MBRP]
Direct For Education and Human Resources
Division Of Graduate Education [1144807] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and close-dreference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to classic open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, classic open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of classic open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by classic open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.

Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

期刊

PEERJ

出版社

PEERJ INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

期刊

PEERJ

出版社

PEERJ INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文