☆ 4.7 Article

KMC 2: fast and resource-frugal k-mer counting

BIOINFORMATICS (2015)

Journal

BIOINFORMATICS

Volume 31, Issue 10, Pages 1569-1576

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btv022

Keywords

Funding

Polish National Science Centre [DEC-2012/05/B/ST6/03148]
'GeCONiI-Upper Silesian Center for Computational Science and Engineering' [POIG.02.03.01-24-099/13]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Building the histogram of occurrences of every k-symbol long substring of nucleotide data is a standard step in many bioinformatics applications, known under the name of k-mer counting. Its applications include developing de Bruijn graph genome assemblers, fast multiple sequence alignment and repeat detection. The tremendous amounts of NGS data require fast algorithms for k-mer counting, preferably using moderate amounts of memory. Results: We present a novel method for k-mer counting, on large datasets about twice faster than the strongest competitors (Jellyfish 2, KMC 1), using about 12GB (or less) of RAM. Our disk-based method bears some resemblance to MSPKmerCounter, yet replacing the original minimizers with signatures (a carefully selected subset of all minimizers) and using (k, x)-mers allows to significantly reduce the I/O and a highly parallel overall architecture allows to achieve unprecedented processing speeds. For example, KMC 2 counts the 28-mers of a human reads collection with 44-fold coverage (106GB of compressed size) in about 20 min, on a 6-core Intel i7 PC with an solid-state disk.

KMC 2: fast and resource-frugal k-mer counting

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

KMC 2: fast and resource-frugal k-mer counting

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper