☆ 4.5 Article

Fast and Accurate Algorithms for Mapping and Aligning Long Reads

JOURNAL OF COMPUTATIONAL BIOLOGY (2021)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 28, 期 8, 页码 789-803

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2020.0603

关键词

long-read mapping and long-read local alignment; longest common subsequence with distance constraints; k-mer-based local alignment with variable value of k

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

National Science Foundation of China [NSFC 61972329]
GRF grant for the Hong Kong Special Administrative Region, China [CityU 11210119]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

For DNA sequence analysis, the challenge lies in the short length of sequencing reads, which can be addressed by utilizing long reads. By designing new mapping and local alignment algorithms, this study showed improved alignments for Nanopore and SMRT data sets. The new method successfully aligned a higher percentage of letters from reads to reference genomes, compared to the best known method, while also achieving faster performance.

For DNA sequence analysis, we are facing challenging tasks such as the identification of structural variants, sequencing repetitive regions, and phasing of alleles. Those challenging tasks suffer from the short length of sequencing reads, where each read may cover less than 2 single nucleotide polymorphism (SNP), or less than two occurrences of a repeated region. It is believed that long reads can help to solve those challenging tasks. In this study, we have designed new algorithms for mapping long reads to reference genomes. We have also designed efficient and effective heuristic algorithms for local alignments of long reads against the corresponding segments of the reference genome. To design the new mapping algorithm, we formulate the problem as the longest common subsequence with distance constraints. The local alignment heuristic algorithm is based on the idea of recursive alignment of k-mers, where the size of k differs in each round. We have implemented all the algorithms in C++ and produce a software package named mapAlign. Experiments on real data sets showed that the newly proposed approach can generate better alignments in terms of both identity and alignment scores for both Nanopore and single molecule real time sequencing (SMRT) data sets. For human individuals of both Nanopore and SMRT data sets, the new method can successfully math/align 91.53% and 85.36% of letters from reads to identical letters on reference genomes, respectively. In comparison, the best known method can only align 88.44% and 79.08% letters of reads for Nanopore and SMRT data sets, respectively. Our method is also faster than the best known method.

Fast and Accurate Algorithms for Mapping and Aligning Long Reads

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fast and Accurate Algorithms for Mapping and Aligning Long Reads

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文