4.6 Article

MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins

期刊

FRONTIERS IN GENETICS
卷 9, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2018.00304

关键词

phage; virus; microbiome; machine learning; random forest

资金

  1. Sao Paulo State Research Foundation (FAPESP) [2014/16450-8]
  2. Brazilian Federal Agency CAPES
  3. CNPq
  4. FAPESP [2011/50870-6]
  5. CAPES [3385/2013]

向作者/读者索取更多资源

Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据