☆ 4.6 Article

HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data

PLOS COMPUTATIONAL BIOLOGY (2022)

期刊

PLOS COMPUTATIONAL BIOLOGY

卷 18, 期 9, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pcbi.1010493

关键词

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

NERC [NE/L002612/1]
European Research Council [ERC-2013-StG-337574-UNDEAD, ERC2019StG-853272-PALAEOFARM]
Natural Environmental Research Council [NE/K005243/1, NE/K003259/1, NE/S007067/1, NE/S00078X/1]
Wellcome Trust [210119/Z/18/Z]
DTP in Environmental Research
Wellcome Trust [210119/Z/18/Z] Funding Source: Wellcome Trust

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

HAYSTAC is a high-accuracy and scalable taxonomic assignment method for metagenomic data, which can estimate the probability of specific taxa presence. It is specifically designed to handle both ancient and modern DNA data efficiently, and can run high-accuracy hypothesis-driven analyses on incomplete reference databases.

Identification of specific species in metagenomic samples is critical for several key applications, yet many tools available require large computational power and are often prone to false positive identifications. Here we describe High-AccuracY and Scalable Taxonomic Assignment of MetagenomiC data (HAYSTAC), which can estimate the probability that a specific taxon is present in a metagenome. HAYSTAC provides a user-friendly tool to construct databases, based on publicly available genomes, that are used for competitive reads mapping. It then uses a novel Bayesian framework to infer the abundance and statistical support for each species identification and provide per-read species classification. Unlike other methods, HAYSTAC is specifically designed to efficiently handle both ancient and modern DNA data, as well as incomplete reference databases, making it possible to run highly accurate hypothesis-driven analyses (i.e., assessing the presence of a specific species) on variably sized reference databases while dramatically improving processing speeds. We tested the performance and accuracy of HAYSTAC using simulated Illumina libraries, both with and without ancient DNA damage, and compared the results to other currently available methods (i.e., Kraken2/Bracken, KrakenUniq, MALT/HOPS, and Sigma). HAYSTAC identified fewer false positives than both Kraken2/Bracken, KrakenUniq and MALT in all simulations, and fewer than Sigma in simulations of ancient data. It uses less memory than Kraken2/Bracken, KrakenUniq as well as MALT both during database construction and sample analysis. Lastly, we used HAYSTAC to search for specific pathogens in two published ancient metagenomic datasets, demonstrating how it can be applied to empirical datasets. HAYSTAC is available from https://github.com/antonisdim/HAYSTAC.

HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文