4.7 Article

ntHash2: recursive spaced seed hashing for nucleotide sequences

Journal

BIOINFORMATICS
Volume 38, Issue 20, Pages 4812-4813

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btac564

Keywords

-

Funding

  1. National Institutes of Health [2R01HG007182-04A1]

Ask authors/readers for more resources

ntHash2 is a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis in genome research. It is faster than previous versions and conventional hashing algorithms, and also improves the uniformity of hash distribution.
Motivation: Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research. Results: ntHash2 is up to 2.1 x faster at hashing various spaced seeds than the previous version and 3.8x faster than conventional hashing algorithms with naive adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available