4.7 Article

IMperm: a fast and comprehensive IMmune Paired-End Reads Merger for sequencing data

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 24, Issue 2, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad080

Keywords

software; high-throughput sequencing; paired-end reads assembly; immune repertoire; MRD detection

Ask authors/readers for more resources

The Adaptive Immune Receptor Repertoire (AIRR) is crucial in cancer immunotherapy and MRD detection. A software package called IMperm was developed to efficiently merge PE reads, successfully handling low-quality and non-overlapping reads. Compared to existing tools, IMperm showed better performance in both simulated and sequencing data, and also demonstrated its effectiveness in handling PE reads from other sources.
The adaptive immune receptor repertoire (AIRR), consisting of T- and B-cell receptors, is the core component of the immune system. The AIRR sequencing is commonly used in cancer immunotherapy and minimal residual disease (MRD) detection of leukemia and lymphoma. The AIRR is captured by primers and sequenced to yield paired-end (PE) reads. The PE reads could be merged into one sequence by the overlapped region between them. However, the wide range of AIRR data raises the difficulty, so a special tool is required. We developed a software package for IMmune PE reads merger of sequencing data, named IMperm. We used the k-mer-and-vote strategy to pin down the overlapped region rapidly. IMperm could handle all types of PE reads, eliminate adapter contamination and successfully merge low-quality and minor/non-overlapping reads. Compared with existing tools, IMperm performed better in both simulated and sequencing data. Notably, IMperm was well suited to processing the data of MRD detection in leukemia and lymphoma and detected 19 novel MRD clones in 14 patients with leukemia from previously published data. Additionally, IMperm can handle PE reads from other sources, and we demonstrated its effectiveness on two genomic and one cell-free deoxyribonucleic acid datasets. IMperm is implemented in the C programming language and consumes little runtime and memory. It is freely available at . https://github.com/ zhangwei2015/IMperm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available