☆ 4.7 Article

A novel sequence alignment algorithm based on deep learning of the protein folding code

BIOINFORMATICS (2021)

期刊

BIOINFORMATICS

卷 37, 期 4, 页码 490-496

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btaa810

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

Division of General Medical Sciences of the National Institute Health [NIH] [R35-118039]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The SAdLSA algorithm effectively learns protein folding code from experimentally determined protein structures, improving structural relationships detection in sequence comparisons. It demonstrates significant improvement over established approaches on challenging datasets, with a time complexity of O(N) thanks to GPU acceleration.

Motivation: From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the 'twilight zone' of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent 'd'). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results: To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure alpha-helical proteins successfully recognizes pairs of structurally related pure beta-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is similar to 150% better than HHsearch for generating pairwise alignments and similar to 50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.

A novel sequence alignment algorithm based on deep learning of the protein folding code

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A novel sequence alignment algorithm based on deep learning of the protein folding code

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文