4.7 Article

Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: Demonstrated with a human genome

Journal

GENOMICS
Volume 111, Issue 6, Pages 1896-1901

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ygeno.2018.12.013

Keywords

3GS (3rd generation sequencing); 10 x Genomics; DBG2OLC; Sparc; Hybrid assembly; Nanopore; Human genome

Funding

  1. National Science Foundation of China [71473243]
  2. Cloud-Ridge Industry Technology Leader Grant, A China-US International Cooperation Project on Genomics/Metagenomics Big Data

Ask authors/readers for more resources

The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1 Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10 x -Genomics (R) technology, which integrates a novel bar-coding strategy with Illumina (R) NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10 x -Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7 x coverage of ultra-long Nanopore (R) reads, augmented with 10 x reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35 x coverage of Nanopore reads. Compared with the assembly with 10 x -Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10 x -Genomics reads offers a low-cost (less than 1/4 the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usage = 61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available