4.7 Article

CHTKC: a robust and efficient k-mer counting algorithm based on a lock-free chaining hash table

期刊

BRIEFINGS IN BIOINFORMATICS
卷 22, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa063

关键词

assembly; DNA-seq; hash table; sequence analysis; k-mer counting; algorithm

资金

  1. National Natural Science Foundation of China [61771165]
  2. International Postdoctoral Exchange Fellowship [20130053]
  3. China Postdoctoral Science Foundation [2018T110302, 2014M551246]

向作者/读者索取更多资源

The paper proposes a new method called CHTKC to efficiently calculate the frequency of each substring of length k in DNA sequences, using a lock-free hash table and linked lists to resolve collisions and optimize memory usage. Thorough testing on multiple datasets shows that using a hash-table-based method remains a feasible solution for the k-mer counting problem.
Motivation: Calculating the frequency of occurrence of each substring of length k in DNA sequences is a common task in many bioinformatics applications, including genome assembly, error correction, and sequence alignment. Although the problem is simple, efficient counting of datasets with high sequencing depth or large genome size is a challenge. Results: We propose a robust and efficient method, CHTKC, to solve the k-mer counting problem with a lock-free hash table that uses linked lists to resolve collisions. We also design new mechanisms to optimize memory usage and handle situations where memory is not enough to accommodate all k-mers. CHTKC has been thoroughly tested on seven datasets under multiple memory usage scenarios and compared with Jellyfish2 and KMC3. Our work shows that using a hash-table-based method to effectively solve the k-mer counting problem remains a feasible solution.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据