☆ 3.8 Article

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

ISME COMMUNICATIONS (2023)

期刊

ISME COMMUNICATIONS

卷 3, 期 1, 页码 -

出版社

SPRINGERNATURE

DOI: 10.1038/s43705-023-00295-9

关键词

类别

Ecology Microbiology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the advancement of sequencing technologies and large-scale sampling and analytical efforts, our understanding of viral sequence space has greatly expanded. However, our knowledge of archaeal viruses outside extreme environments is limited due to the lack of a reliable and systematic approach to distinguish between bacterial and archaeal viruses in curated datasets. In this study, we upgraded our previous text-based tool (MArVD) by training and testing a random forest machine learning algorithm using a newly curated dataset of archaeal viruses. The optimized MArVD2 showed significant improvements in scalability, usability, and flexibility, and can accommodate user-defined training datasets as new archaeal viruses are discovered.

Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

期刊

ISME COMMUNICATIONS

出版社

SPRINGERNATURE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

期刊

ISME COMMUNICATIONS

出版社

SPRINGERNATURE

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文