4.0 Article

Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Journal

ALGORITHMS FOR MOLECULAR BIOLOGY
Volume 8, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1748-7188-8-22

Keywords

de novo assembly; de Bruijn graph; Bloom filter

Funding

  1. ANR
  2. MAPPI [ANR-10-COSI-0004]
  3. GATB [ANR-12-EMMA-0019]
  4. Agence Nationale de la Recherche (ANR) [ANR-12-EMMA-0019, ANR-10-COSI-0004] Funding Source: Agence Nationale de la Recherche (ANR)

Ask authors/readers for more resources

Background: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e. g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (>= 30 GB). Results: We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. Conclusions: An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available