4.7 Article

Merging of multi-string BWTs with applications

Journal

BIOINFORMATICS
Volume 30, Issue 24, Pages 3524-3531

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btu584

Keywords

-

Funding

  1. Jackson Lab Center for Genome Dynamics: Evolution, Organization and Function [NIH P50GM076468]
  2. University of North Carolina Center for CISGen: Systems Genetics of Psychiatric Disorders [NIH P50HG006582, P50MH090338]

Ask authors/readers for more resources

Motivation: The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows-Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k-mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis. Results: We present a novel algorithm that merges multi-string BWTs in O(LCS x N) time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using O(N x log(2)(F)) bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available