4.6 Article

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map

Journal

JOURNAL OF CHEMINFORMATICS
Volume 13, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s13321-021-00488-1

Keywords

Protein solubility prediction; Graph neural network; Predicted contact map; Deep learning

Funding

  1. National Key R&D Program of China [2020YFB020003]
  2. National Natural Science Foundation of China [61772566]
  3. Guangdong Key Field RD Plan [2019B020228001, 2018B010109006]
  4. Introducing Innovative and Entrepreneurial Teams [2016ZT06D211]
  5. Guangzhou ST Research Plan [202007030010]

Ask authors/readers for more resources

This study developed a new structure-aware method GraphSol for predicting protein solubility using attentive graph convolutional network, constructing a protein topology attribute graph from the sequence. The model showed superior performance and stability, being the first to utilize GCN for sequence-based protein solubility predictions.
Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent R-2 of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available