4.5 Article

Space-efficient whole genome comparisons with Burrows-Wheeler transforms

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY
Volume 12, Issue 4, Pages 407-415

Publisher

MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2005.12.407

Keywords

strings; Burrows-Wheeler transforms; BWT; comparative genomics; compressed suffix array

Ask authors/readers for more resources

The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time- efficient, O( n), data structures for this computation, such as the suffix tree, require O( n log( n)) space, several times the space of the genomes themselves. Thus, any reasonable whole- genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time- efficiency. This is beyond most modern workstations. With a new data structure, the compressed suffix array ( CSA) implemented via the Burrows - Wheeler transform, we can trade time- efficiency for space- efficiency, taking O( n log( n)) time, but running in O( n) space, typically in total space less than or equal to that of the genomes themselves. If space is more expensive than time, this is an appropriate approach to consider. The most space- efficient implementation of this data structure requires 5 bits per nucleotide character to build on- line, in the worst case, and 2.5 bits per character to store once built. We present a description of this data structure and how it is used to obtain matches. An implementation ( called bbbwt) is demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available