4.7 Article

Enhancing Protein Function Prediction Performance by Utilizing AlphaFold-Predicted Protein Structures

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 62, Issue 17, Pages 4008-4017

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c00885

Keywords

-

Funding

  1. Shandong Provincial Postdoctoral Program for Innovative Talents [SDBX2020003]
  2. Natural Science Foundation of Shandong Province [ZR2021MF011]

Ask authors/readers for more resources

The structure of a protein plays a crucial role in its functionality. AlphaFold2's predicted protein structures offer a solution to the limited availability of structures and can enhance data-driven prediction models. In this study, the performance of structure-based function prediction models was evaluated by incorporating AlphaFold-predicted structures into the training set. The results showed that the models benefited from the addition of predicted structures, achieving comparable performance to models trained with experimentally solved structures.
The structure of a protein is of great importance in determining its functionality, and this characteristic can be leveraged to train data-driven prediction models. However, the limited number of available protein structures severely limits the performance of these models. AlphaFold2 and its open-source data set of predicted protein structures have provided a promising solution to this problem, and these predicted structures are expected to benefit the model performance by increasing the number of training samples. In this work, we constructed a new data set that acted as a benchmark and implemented a state-of-the-art structure based approach for determining whether the performance of the function prediction model can be improved by putting additional AlphaFold-predicted structures into the training set and further compared the performance differences between two models separately trained with real structures only and AlphaFold-predicted structures only. Experimental results indicated that structure based protein function prediction models could benefit from virtual training data consisting of AlphaFold-predicted structures. First, model performances were improved in all three categories of Gene Ontology terms (GO terms) after adding predicted structures as training samples. Second, the model trained only on AlphaFold-predicted virtual samples achieved comparable performances to the model based on experimentally solved real structures, suggesting that predicted structures were almost equally effective in predicting protein functionality.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available