期刊
ANALYTICAL BIOCHEMISTRY
卷 612, 期 -, 页码 -出版社
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ab.2020.113955
关键词
Post-translation modification; Phosphorylation sites; Deep learning; Stacked long short term memory; Sequence feature information
资金
- National Natural Science Foundation of China [62072243, 61772273]
- Fundamental Research Funds for the Central Universities [30918011104]
Phosphorylation is a common type of post-translational modification that plays crucial roles in protein function, with abnormal phosphorylation linked to various diseases. Current wet-lab technologies for phosphorylation site identification are costly and time-consuming, highlighting the need for efficient computational algorithms. The newly introduced deep learning-based predictor, DeepPPSite, achieves superior performance in predicting phosphorylation sites by utilizing a stacked long short-term memory recurrent network to learn protein representations from protein descriptors.
Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据