4.7 Article

SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csbj.2022.03.018

关键词

Sequence analysis; Alignment; Phylogenetic tree; Suffix array

资金

  1. Sichuan Science and Technology Program [2021YFN0119]
  2. National Natural Science Foundation of China [31870240]

向作者/读者索取更多资源

Multiple DNA/RNA sequence alignment is a crucial tool in bioinformatics for phylogenetic tree construction. By optimizing a dynamic programming algorithm using the longest common substring method, we developed an efficient alignment tool that outperforms existing tools for large datasets and long sequences.
Multiple DNA/RNA sequence alignment is an important fundamental tool in bioinformatics, especially for phylogenetic tree construction. With DNA-sequencing improvements, the amount of bioinformatics data is constantly increasing, and various tools need to be iterated constantly. Mitochondrial genome analyses of multiple individuals and species require bioinformatics software; therefore, their performances need to be optimized. To improve the alignment of ultra-large datasets and ultra-long sequences, we optimized a dynamic programming algorithm using longest common substring methods. Ultra-large test DNA datasets, containing sequences of different lengths, some over 300 kb (kilobase), revealed that the Multiple DNA/RNA Sequence Alignment Tool Based on Suffix Tree (SaAlign) saved time and computational space. It outperformed the existing technical tools, including MAFFT and HAlign-II. For mitochondrial genome datasets having limited numbers of sequences, MAFFT performed the required tasks, but it could not handle ultra-large mitochondrial genome datasets for core dump error. We implement a multiple DNA/RNA sequence alignment tool based on Center Star strategy and use suffix array algorithm to optimize the spatial and time efficiency. Nowadays, whole-genome research and NGS technology are becoming more popular, and it is necessary to save computational resources for laboratories. That software is of great significance in these aspects, especially in the study of the whole-mitochondrial genome of plants. (C) 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据