☆ 4.6 Article

Mining Discriminative K-Mers in DNA Sequences Using Sketches and Hardware Acceleration

IEEE ACCESS (2020)

期刊

IEEE ACCESS

卷 8, 期 -, 页码 114715-114732

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2020.3003918

关键词

Discriminative k-mers; heavy hitters; counting sketches; parallel processing; hardware acceleration; field-programmable gate arrays

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

Agencia Nacional de Investigacion y Desarrollo (ANID) through Fondecyt [1180995, 11160375]
ANID Basal Projects [FB0001, FB0008]
ANID Magister Nacional Scholarships

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Extracting discriminative k-mers is an important and challenging problem in DNA sequence analysis with applications in metagenomics and motif discovery. Despite the availability of multiple computational tools designed for this purpose, most discriminative k-mer discovery methods suffer from long execution times and high memory usage when processing large datasets. This paper presents a novel approach for discriminative k-mer discovery in DNA sequences, which leverages streaming and sketch algorithms to reduce space complexity and expose data parallelism, enabling the use of parallel platforms for accelerating the execution of computationally-intensive operations. To assess the performance of our method, we designed and implemented two versions of the algorithm that leverage parallelization at different levels: (i) a software version tailored for multithreading and vector instructions in commodity CPUs, and (ii) a custom architecture implemented on a Field-Programmable Gate Array (FPGA) accelerator that exploits fine-grain parallelism and deep pipelining on reconfigurable logic. Experimental results show that, when mining discriminative k-mers from a set of well-known ChIP-seq sequences, our parallel software implementation executes at least 15% faster than exact-counting tools, and requires at least five times less memory when processing large datasets. More importantly, we designed a custom FPGA-based accelerator for our algorithm on a Xilinx KCU1500 board, which achieves speedups above 78x with the largest datasets, compared to our parallel software implementation. The accelerator uses less than 3% of the logic resources available on the on-board XCKU115 Kintex-7 Ultrascale FPGA, and between 12% and 70% of the memory resources, depending on the size of the dataset.

Mining Discriminative K-Mers in DNA Sequences Using Sketches and Hardware Acceleration

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Mining Discriminative K-Mers in DNA Sequences Using Sketches and Hardware Acceleration

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文