期刊
JOURNAL OF COMPUTATIONAL SCIENCE
卷 60, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.jocs.2022.101572
关键词
Bioinformatics; Next-generation sequencing; RNA-seq; Transcriptomics; Read mapping; Hashing; Parallelism; Big data
资金
- RMU Initiative Funding for Research by the Rhine-Main Universities (Johannes Gutenberg University Mainz) as part of the project RMU Network for Deep Continuous-Discrete Machine Learning (DeCoDeML)
- RMU Initiative Funding for Research by the Rhine-Main Universities (Goethe University Frankfurt) as part of the project RMU Network for Deep Continuous-Discrete Machine Learning (DeCoDeML)
- RMU Initiative Funding for Research by the Rhine-Main Universities (TU Darmstadt) as part of the project RMU Network for Deep Continuous-Discrete Machine Learning (DeCoDeML)
RNACache is a novel approach based on context-aware locality sensitive hashing for detecting local similarities between transcriptomes and RNA-seq reads. It consists of a three-step processing pipeline that accurately identifies truly expressed transcript isoforms and offers better performance and scalability compared to other lightweight mapping tools.
Mapping of reads to transcriptomes is a crucial initial step for bioinformatics RNA-seq pipelines. As alignment-based methods exhibit high computational complexities, lightweight alignment-free methods are becoming increasingly important. We present RNACache - a novel approach to the detection of local similarities between transcriptomes and RNA-seq reads based on context-aware locality sensitive hashing. It consists of a three-step processing pipeline consisting of subsampling of k-mers, match-based (online) filtering, and coverage-based filtering in order to identify truly expressed transcript isoforms. Our performance evaluation shows that RNACache produces transcriptomic mappings of high accuracy that include significantly fewer erroneous matches compared to the state-of-the-art lightweight mappers RapMap, Salmon, and Kallisto. Furthermore, it offers good scalability in terms of number of utilized CPU cores and has the best runtime performance at low memory consumption on modern multi-core workstations. This is an extended version of our previously published conference paper (Cascitti et al., 2021). RNACache is available at https://github.com/jcasc/rnacache.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据