4.7 Article

UbiSitePred: A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou's pseudo components

Journal

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
Volume 184, Issue -, Pages 28-43

Publisher

ELSEVIER
DOI: 10.1016/j.chemolab.2018.11.012

Keywords

Ubiquitination sites; Binary encoding; Pseudo-amino acid composition; Composition of k-spaced amino acid pairs; Position-specific propensity matrices; Least absolute shrinkage and selection operator

Funding

  1. National Natural Science Foundation of China [61863010, 11771188]
  2. Natural Science Foundation of Shandong Province of China [ZR2018MC007]
  3. Project of Shandong Province Higher Educational Science and Technology Program [J17KA159]
  4. National Science Foundation [ACI-1548562]

Ask authors/readers for more resources

Ubiquitination is an essential process in protein post-translational modification, which plays a crucial role in cell life activities, such as proteasomal degradation, transcriptional regulation, and DNA damage repair. Therefore, recognition of ubiquitination sites is a crucial step to understand the molecular mechanisms of ubiquitination. However, the experimental verification of numerous ubiquitination sites is time-consuming and costly. To alleviate these issues, a computational approach is needed to predict ubiquitination sites. This paper proposes a new method called UbiSitePred for predicting ubiquitination sites combined least absolute shrinkage and selection operator (LASSO) feature selection and support vector machine. First, we use binary encoding (BE), pseudo-amino acid composition (PseAAC), the composition of k-spaced amino acid pairs (CKSAAP), position-specific propensity matrices (PSPM) to extract the sequence feature information; thus, the initial feature space is obtained. Secondly, LASSO is applied to remove the feature redundancy information and selects the optimal feature subset. Finally, the optimal feature subset is input into the support vector machine (SVM) to predict the ubiquitination sites. Five-fold cross-validation shows that UbiSitePred model can achieve a better prediction performance compared with other methods, the AUC values for Set1, Set2, and Set3 are 0.9998, 0.8887, and 0.8481, respectively. Notably, the UbiSitePred has overall accuracy rates of 98.33%, 81.12%, and 76.90%, respectively. The results demonstrate that the proposed method is significantly superior to other state-of-the-art prediction methods and provide a new idea for the prediction of other post-translational modification sites of proteins. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSitePred/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available