☆ 4.7 Article

ntHash2: recursive spaced seed hashing for nucleotide sequences

BIOINFORMATICS (2022)

期刊

BIOINFORMATICS

卷 38, 期 20, 页码 4812-4813

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btac564

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

National Institutes of Health [2R01HG007182-04A1]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

ntHash2 is a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis in genome research. It is faster than previous versions and conventional hashing algorithms, and also improves the uniformity of hash distribution.

Motivation: Spaced seeds are robust alternatives to k-mers in analyzing nucleotide sequences with high base mismatch rates. Hashing is also crucial for efficiently storing abundant sequence data. Here, we introduce ntHash2, a fast algorithm for spaced seed hashing that can be integrated into various bioinformatics tools for efficient sequence analysis with applications in genome research. Results: ntHash2 is up to 2.1 x faster at hashing various spaced seeds than the previous version and 3.8x faster than conventional hashing algorithms with naive adaptation. Additionally, we reduced the collision rate of ntHash for longer k-mer lengths and improved the uniformity of the hash distribution by modifying the canonical hashing mechanism.

ntHash2: recursive spaced seed hashing for nucleotide sequences

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ntHash2: recursive spaced seed hashing for nucleotide sequences

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文