4.7 Article

Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins

期刊

JOURNAL OF TRANSLATIONAL MEDICINE
卷 19, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12967-021-02851-0

关键词

Post-translational modification; MRMR; Symmetrical uncertainty; Random forest; Support vector machine

资金

  1. Department of Health Research (DHR), India
  2. DST Purse Grant

向作者/读者索取更多资源

Machine-learning-based predictors utilizing physicochemical, sequence, structural, and functional information were proposed in this study to classify S/T/Y phosphorylation sites. Rigorous feature selection methods were employed to extract informative features. The RF and SVM models generated showed high accuracy, outperforming existing methods in predicting protein phosphorylation.
Background Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases-most commonly neurological disorders, Alzheimer's disease, and Parkinson's disease-thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. Methods In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. Results The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. Conclusions The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据