4.7 Article

SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting

Journal

BIOINFORMATICS
Volume 36, Issue 4, Pages 1074-1081

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btz734

Keywords

-

Funding

  1. National Nature Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Natural Science Foundation of Shandong Province of China [ZR2017MA014, ZR2018MC007]
  4. Project of Shandong Province Higher Educational Science and Technology Program [J17KA159]
  5. Scientific Research Fund of Hunan Provincial Key Laboratory of Mathematical Modelling and Analysis in Engineering [2018MMAEZD10]
  6. National Science Foundation [ACI-1548562]

Ask authors/readers for more resources

Motivation: Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design. Results: We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8-12.5% and 3.8-9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available