☆ 4.7 Article

Merging of multi-string BWTs with applications

BIOINFORMATICS (2014)

Journal

BIOINFORMATICS

Volume 30, Issue 24, Pages 3524-3531

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btu584

Keywords

Funding

Jackson Lab Center for Genome Dynamics: Evolution, Organization and Function [NIH P50GM076468]
University of North Carolina Center for CISGen: Systems Genetics of Psychiatric Disorders [NIH P50HG006582, P50MH090338]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: The throughput of genomic sequencing has increased to the point that is overrunning the rate of downstream analysis. This, along with the desire to revisit old data, has led to a situation where large quantities of raw, and nearly impenetrable, sequence data are rapidly filling the hard drives of modern biology labs. These datasets can be compressed via a multi-string variant of the Burrows-Wheeler Transform (BWT), which provides the side benefit of searches for arbitrary k-mers within the raw data as well as the ability to reconstitute arbitrary reads as needed. We propose a method for merging such datasets for both increased compression and downstream analysis. Results: We present a novel algorithm that merges multi-string BWTs in O(LCS x N) time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using O(N x log(2)(F)) bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BWT for bioinformatics applications.

Merging of multi-string BWTs with applications

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Merging of multi-string BWTs with applications

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper