4.5 Article

DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information

Journal

ANALYTICAL BIOCHEMISTRY
Volume 612, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ab.2020.113955

Keywords

Post-translation modification; Phosphorylation sites; Deep learning; Stacked long short term memory; Sequence feature information

Funding

  1. National Natural Science Foundation of China [62072243, 61772273]
  2. Fundamental Research Funds for the Central Universities [30918011104]

Ask authors/readers for more resources

Phosphorylation is a common type of post-translational modification that plays crucial roles in protein function, with abnormal phosphorylation linked to various diseases. Current wet-lab technologies for phosphorylation site identification are costly and time-consuming, highlighting the need for efficient computational algorithms. The newly introduced deep learning-based predictor, DeepPPSite, achieves superior performance in predicting phosphorylation sites by utilizing a stacked long short-term memory recurrent network to learn protein representations from protein descriptors.
Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available