3.8 Article

End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins

期刊

ACS BIOMATERIALS SCIENCE & ENGINEERING
卷 8, 期 3, 页码 1156-1165

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acsbiomaterials.1c01343

关键词

deep learning; protein structure; secondary structure; structural proteins; artificial intelligence; materiomics

资金

  1. MIT-IBM AI lab
  2. ONR [N000141612333, N000141912375]
  3. AFOSR [FATE MURI FA9550-15-1-0514]
  4. NIH [U01 EB014976]
  5. ARO [W911NF1920098]
  6. Ministry of Science and Technology in Taiwan [MOST 109-2222-E-006-005-MY2, MOST 109-2224-E-007-003-]

向作者/读者索取更多资源

This article reports a deep learning model that predicts the secondary structure content of proteins from their primary sequences. The model, using convolutional and recurrent architectures, accurately predicts the content of alpha-helix and beta-sheet structures. The predictions show excellent agreement with newly identified protein structures and have the potential for rapidly designing proteins with specific secondary structures.
Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the alpha-helix and the beta-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The alpha-helix and beta-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min alpha-helix/beta-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据