4.6 Article

Mining Discriminative K-Mers in DNA Sequences Using Sketches and Hardware Acceleration

期刊

IEEE ACCESS
卷 8, 期 -, 页码 114715-114732

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3003918

关键词

Discriminative k-mers; heavy hitters; counting sketches; parallel processing; hardware acceleration; field-programmable gate arrays

资金

  1. Agencia Nacional de Investigacion y Desarrollo (ANID) through Fondecyt [1180995, 11160375]
  2. ANID Basal Projects [FB0001, FB0008]
  3. ANID Magister Nacional Scholarships

向作者/读者索取更多资源

Extracting discriminative k-mers is an important and challenging problem in DNA sequence analysis with applications in metagenomics and motif discovery. Despite the availability of multiple computational tools designed for this purpose, most discriminative k-mer discovery methods suffer from long execution times and high memory usage when processing large datasets. This paper presents a novel approach for discriminative k-mer discovery in DNA sequences, which leverages streaming and sketch algorithms to reduce space complexity and expose data parallelism, enabling the use of parallel platforms for accelerating the execution of computationally-intensive operations. To assess the performance of our method, we designed and implemented two versions of the algorithm that leverage parallelization at different levels: (i) a software version tailored for multithreading and vector instructions in commodity CPUs, and (ii) a custom architecture implemented on a Field-Programmable Gate Array (FPGA) accelerator that exploits fine-grain parallelism and deep pipelining on reconfigurable logic. Experimental results show that, when mining discriminative k-mers from a set of well-known ChIP-seq sequences, our parallel software implementation executes at least 15% faster than exact-counting tools, and requires at least five times less memory when processing large datasets. More importantly, we designed a custom FPGA-based accelerator for our algorithm on a Xilinx KCU1500 board, which achieves speedups above 78x with the largest datasets, compared to our parallel software implementation. The accelerator uses less than 3% of the logic resources available on the on-board XCKU115 Kintex-7 Ultrascale FPGA, and between 12% and 70% of the memory resources, depending on the size of the dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据