4.7 Article

Long-Read Metagenomics Improves the Recovery of Viral Diversity from Complex Natural Marine Samples

Journal

MSYSTEMS
Volume 7, Issue 3, Pages -

Publisher

AMER SOC MICROBIOLOGY
DOI: 10.1128/msystems.00192-22

Keywords

PacBio CCS long reads; bacteriophage; long-read sequencing; metagenome; viral diversity; virome

Categories

Funding

  1. Spanish Ministerio de Economia, Industria y Competitividad [VIREVO CGL2016-76273-P, FLEX3GEN PID2020-118052GB-I00]
  2. FEDER funds
  3. Generalitat Valenciana [HIDRAS3 PROMETEU/2019/009]
  4. Spanish Ministerio de Economia y Competitividad [BES-2017-079993]

Ask authors/readers for more resources

In this study, the recovery of viral diversity from marine samples using long-read sequencing was explored. The results showed that a significant portion of marine viral diversity was directly recovered by PacBio circular consensus sequencing (CCS) reads, with some sequences not being detected in the short- and long-read assembly. Additionally, the hybrid assembly of long and short reads improved the length and host assignment of the viral sequences.
The recovery of DNA from viromes is a major obstacle in the use of long-read sequencing to study their genomes. For this reason, the use of cellular metagenomes (>0.2-mu m size range) emerges as an interesting complementary tool, since they contain large amounts of naturally amplified viral genomes from prelytic replication. We have applied second-generation (Illumina NextSeq; short reads) and third-generation (PacBio Sequel II; long reads) sequencing to compare the diversity and features of the viral community in a marine sample obtained from offshore waters of the western Mediterranean. We found that a major wedge of the expected marine viral diversity was directly recovered by the raw PacBio circular consensus sequencing (CCS) reads. More than 30,000 sequences were detected only in this data set, with no homologues in the long- and short-read assembly, and ca. 26,000 had no homologues in the large data set of the Global Ocean Virome 2 (GOV2), highlighting the information gap created by the assembly bias. At the level of complete viral genomes, the performance was similar in both approaches. However, the hybrid long- and short-read assembly provided the longest average length of the sequences and improved the host assignment. Although no novel major clades of viruses were found, there was an increase in the intraclade genomic diversity recovered by long reads that produced an enriched assessment of the real diversity and allowed the discovery of novel genes with biotechnological potential (e.g., endolysin genes). IMPORTANCE We explored the vast genetic diversity of environmental viruses by using a combination of cellular metagenome (as opposed to virome) sequencing using high-fidelity long-read sequences (in this case, PacBio CCS). This approach resulted in the recovery of a representative sample of the viral population, and it performed better (more phage contigs, larger average contig size) than Illumina sequencing applied to the same sample. By this approach, the many biases of assembly are avoided, as the CCS reads recovers (typically around 5 kb) complete genes and even operons, resulting in a better discovery of the viral gene diversity based on viral marker proteins. Thus, biotechnologically promising genes, such as endolysin genes, can be very efficiently searched with this approach. In addition, hybrid assembly produces more complete and longer contigs, which is particularly important for studying little-known viral groups such as the nucleocytoplasmic large DNA viruses (NCLDV).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available