4.5 Article

Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier

Journal

JOURNAL OF MOLECULAR GRAPHICS & MODELLING
Volume 107, Issue -, Pages -

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jmgm.2021.107962

Keywords

Ubiquitination sites; Multi-view features; LASSO; SMOTE; XGBoost

Funding

  1. National Natural Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Key Laboratory Open Foundation of Hainan Province [JSKX202001]
  4. Science and Technology Project of Colleges in Shandong Province [J17KB122]

Ask authors/readers for more resources

This study introduces a ubiquitination site prediction method, UbiSite-XGBoost, based on multi-view features. Experimental results demonstrate that this method is superior to other ubiquitination prediction methods.
Ubiquitination is a common and reversible post-translational protein modification that regulates apoptosis and plays an important role in protein degradation and cell diseases. However, experimental identification of protein ubiquitination sites is usually time-consuming and labor-intensive, so it is necessary to establish effective predictors. In this study, we propose a ubiquitination sites prediction method based on multi-view features, namely UbiSite-XGBoost. Firstly, we use seven single-view features encoding methods to convert protein sequence fragments into digital information. Secondly, the least absolute shrinkage and selection operator (LASSO) is applied to remove the redundant information and get the optimal feature subsets. Finally, these features are inputted into the eXtreme gradient boosting (XGBoost) classifier to predict ubiquitination sites. Five-fold crossvalidation shows that the AUC values of Set1-Set6 datasets are 0.8258, 0.7592, 0.7853, 0.8345, 0.8979 and 0.8901, respectively. The synthetic minority oversampling technique (SMOTE) is employed in Set4-Set6 unbalanced datasets, and the AUC values are 0.9777, 0.9782 and 0.9860, respectively. In addition, we have constructed three independent test datasets which the AUC values are 0.8007, 0.6897 and 0.7280, respectively. The results show that the proposed method UbiSite-XGBoost is superior to other ubiquitination prediction methods and it provides new guidance for the identification of ubiquitination sites. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/UbiSite-XGBoost/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available