4.8 Article

Alignment-free d2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences

Journal

NUCLEIC ACIDS RESEARCH
Volume 45, Issue 1, Pages 39-53

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkw1002

Keywords

-

Funding

  1. National Science Foundation [OCE 1136818, DMS-1518001]
  2. Gordon and Betty Moore Foundation Marine Microbiology Initiative [GBMF3779]
  3. University of Southern California
  4. Directorate For Geosciences
  5. Division Of Ocean Sciences [1136818] Funding Source: National Science Foundation

Ask authors/readers for more resources

Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among 32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure d(2)(*) at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacteriophage and 2699 bacterial genomes, d(2)(*) host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Euclidian distance ONF (32%) or homology-based (2262%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, d(2)(*)-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metagenomic contigs from the same habitat or samples as the query viruses. The d(2)(*) ONF method will greatly improve the characterization of novel, metagenomic viruses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available