4.0 Article

Identifying viruses from metagenomic data using deep learning

Journal

QUANTITATIVE BIOLOGY
Volume 8, Issue 1, Pages 64-77

Publisher

HIGHER EDUCATION PRESS
DOI: 10.1007/s40484-019-0187-4

Keywords

metagenome; deep learning; virus identification; machine learning

Funding

  1. U.S. National Institutes of Health [R01GM120624]
  2. National Science Foundation [DMS-1518001]
  3. National Natural Science Foundation of China [11701546]
  4. Simons Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems (CBIOMES) [549943]

Ask authors/readers for more resources

AnstractBackgroundThe recent development of metagenomic sequencing makes it possible to massively sequence microbial genomes including viral genomes without the need for laboratory culture. Existing reference-based and gene homology-based methods are not efficient in identifying unknown viruses or short viral sequences from metagenomic data.MethodsHere we developed a reference-free and alignment-free machine learning method, DeepVirFinder, for identifying viral sequences in metagenomic data using deep learning.ResultsTrained based on sequences from viral RefSeq discovered before May 2015, and evaluated on those discovered after that date, DeepVirFinder outperformed the state-of-the-art method VirFinder at all contig lengths, achieving AUROC 0.93, 0.95, 0.97, and 0.98 for 300, 500, 1000, and 3000 bp sequences respectively. Enlarging the training data with additional millions of purified viral sequences from metavirome samples further improved the accuracy for identifying virus groups that are under-represented. Applying DeepVirFinder to real human gut metagenomic samples, we identified 51,138 viral sequences belonging to 175 bins in patients with colorectal carcinoma (CRC). Ten bins were found associated with the cancer status, suggesting viruses may play important roles in CRC.ConclusionsPowered by deep learning and high throughput sequencing metagenomic data, DeepVirFinder significantly improved the accuracy of viral identification and will assist the study of viruses in the era of metagenomics.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available