4.0 Article

A graph extension of the positional Burrows-Wheeler transform and its applications

Journal

ALGORITHMS FOR MOLECULAR BIOLOGY
Volume 12, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s13015-017-0109-9

Keywords

PBWT; Haplotype; Genome graph

Funding

  1. National Human Genome Research Institute of the National Institutes of Health [5U54HG007990]
  2. W.M. Keck foundation [DT06172015]
  3. Simons Foundation [351901]
  4. ARCS Foundation

Ask authors/readers for more resources

We present a generalization of the positional Burrows-Wheeler transform, or PBWT, to genome graphs, which we call the gPBWT. A genome graph is a collapsed representation of a set of genomes described as a graph. In a genome graph, a haplotype corresponds to a restricted form of walk. The gPBWT is a compressible representation of a set of these graph-encoded haplotypes that allows for efficient subhaplotype match queries. We give efficient algorithms for gPBWT construction and query operations. As a demonstration, we use the gPBWT to quickly count the number of haplotypes consistent with random walks in a genome graph, and with the paths taken by mapped reads; results suggest that haplotype consistency information can be practically incorporated into graph-based read mappers. We estimate that with the gPBWT of the order of 100,000 diploid genomes, including all forms structural variation, could be stored and made searchable for haplotype queries using a single large compute node.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available