4.7 Article

Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures

期刊

BIOMOLECULES
卷 12, 期 7, 页码 -

出版社

MDPI
DOI: 10.3390/biom12070908

关键词

protein molecule; tertiary structure; multi-structure view; generative model; variational autoencoder; spatial pyramidal pooling; training set configuration; disentanglement

资金

  1. National Science Foundation [1907805, 1900061, 1763233]
  2. Direct For Computer & Info Scie & Enginr
  3. Div Of Information & Intelligent Systems [1763233] Funding Source: National Science Foundation
  4. Direct For Computer & Info Scie & Enginr
  5. Div Of Information & Intelligent Systems [1907805] Funding Source: National Science Foundation
  6. Division of Computing and Communication Foundations
  7. Direct For Computer & Info Scie & Enginr [1900061] Funding Source: National Science Foundation

向作者/读者索取更多资源

With the debut of AlphaFold2, it is now possible to obtain a highly-accurate view of the equilibrium tertiary structure of a protein molecule. However, the single-structure view is not sufficient to account for the structural plasticity of protein molecules. This research advances the capabilities of deep learning models to learn from experimentally-available tertiary structures of proteins and explores the role of the composition of the training dataset in learning key patterns. The authors also propose a disentangled latent variable model that improves upon existing models, opening up new avenues of research for computing multi-structure views of protein molecules.
With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network's ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据