4.7 Article

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2022.3219114

关键词

Sequential analysis; Instruction sets; Task analysis; Bioinformatics; Runtime; Codes; Arrays; Next generation sequencing; FASTQ; FASTA; I; O; file parsing; multi-core CPUs; HPC

向作者/读者索取更多资源

The continuous growth of generated sequencing data has resulted in the development of bioinformatics tools. However, many of these tools are restricted by slow execution times due to parsing files. This motivates the design of RabbitFX, a framework that efficiently parses sequencing data on modern multi-core systems. It provides optimized formatting implementation and user-friendly APIs that can integrate into applications to increase file parsing speed. Integration of RabbitFX into three I/O-intensive applications shows significant speedups compared to the original versions. RabbitFX is open-source software available at https://github.com/RabbitBio/RabbitFX.
The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据