4.6 Article

Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows

Journal

BMC BIOINFORMATICS
Volume 14, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-14-S9-S2

Keywords

-

Funding

  1. Centers for Disease Control and Prevention
  2. Agriculture and Food Research Initiative Competitive Grant from the USDA National Institute of Food and Agriculture [201167016-30331]
  3. Life Technology Grant Viral Metagenome Reconstruction Software for Ion Torrent PGM Sequencer
  4. NSF [IIS-0916401, IIS-0916948]
  5. Molecular Basis of Disease Fellowship, Georgia State University
  6. Direct For Computer & Info Scie & Enginr
  7. Div Of Information & Intelligent Systems [0916401] Funding Source: National Science Foundation
  8. Div Of Information & Intelligent Systems
  9. Direct For Computer & Info Scie & Enginr [0916948] Funding Source: National Science Foundation

Ask authors/readers for more resources

Background: Highly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved. Results: Two new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets. Conclusions: The problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available