4.6 Article

Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps

期刊

BMC BIOINFORMATICS
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-022-04829-1

关键词

Protein sequence; Protein domain boundary; Attention; Protein distance map; Deep learning

资金

  1. Department of Energy [DE-AR0001213, DE-SC0020400, DE-SC0021303]
  2. NSF [DBI1759934, IIS1763246]
  3. NIH [R01GM093123]
  4. Office of Science of the U.S. Department of Energy [DE-AC05-00OR22725]

向作者/读者索取更多资源

In this study, a deep learning method called DistDom is developed to accurately predict protein domain boundaries using 1D sequence features and predicted 2D inter-residue distance map. The method outperforms the state-of-the-art techniques in terms of accuracy and F1 measure on both single-domain and multi-domain proteins.
The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据