4.7 Article

Inference of viral quasispecies with a paired de Bruijn graph

Journal

BIOINFORMATICS
Volume 37, Issue 4, Pages 473-481

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa782

Keywords

-

Funding

  1. European Union's Horizon 2020 research and innovation programme [690941]
  2. Ministerio de Ciencia, Innovacion y Universidades [TIN2016-78011-C4-1-R, TIN2016-77158-C4-3-R, FPU17/02742]
  3. Xunta de Galicia [ED431C 2017/58, ED431G/01, IN848D2017-2350417, IN852A 2018/14]
  4. Academy of Finland [308030, 314170, 323233]
  5. Centro de Investigacion de Galicia CITIC - Xunta de Galicia
  6. European Union (European Regional Development Fund-Galicia 2014-2020 Program) [ED431G 2019/01]
  7. Academy of Finland (AKA) [323233, 314170, 323233, 314170] Funding Source: Academy of Finland (AKA)

Ask authors/readers for more resources

The study presents viaDBG, a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. By iteratively correcting sequencing errors, using large k-mers, and incorporating paired-end information in the graph, viaDBG achieves both accuracy and speed in assembling viral quasispecies.
Motivation: RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. Results: We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available