4.6 Article

Predicting Golgi-Resident Protein Types Using Conditional Covariance Minimization With XGBoost Based on Multiple Features Fusion

Journal

IEEE ACCESS
Volume 7, Issue -, Pages 144154-144164

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2019.2938081

Keywords

Golgi-resident protein; multi-information fusion; conditional covariance minimization; synthetic minority over sampling technique; extreme gradient boosting

Funding

  1. National Nature Science Foundation of China [61863010, 11771188]
  2. Natural Science Foundation of Shandong Province of China [ZR2018MC007]
  3. Project of Shandong Province Higher Educational Science and Technology Program [J17KA159, J16LI51]
  4. Scientific Research Fund of Hunan Provincial Key Laboratory of Mathematical Modeling and Analysis in Engineering [2018MMAEZD10]
  5. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  6. National Science Foundation [ACI-1548562]
  7. Extreme Science and Engineering Discovery Environment

Ask authors/readers for more resources

The Golgi apparatus is a key organelle for protein synthesis in eukaryotic cell. Any dysfunction of Golgi-resident proteins can lead to different diseases, especially neurodegenerative and inherited diseases, such as diabetes, cancer, and cystic fibrosis, and so on. Therefore, the accurate classification of Golgi-resident proteins may contribute to drug development and further to drug therapy. This paper presents a novel Golgi-resident protein types prediction method called Golgi-XGBoost. First, the feature vectors of protein sequence are extracted by fusing pseudo-amino acid composition (PseAAC), dipeptide composition (DC), pseudo-position specific scoring matrix (PsePSSM) and encoding based on grouped weight (EBGW). Secondly, the conditional covariance minimization (CCM) is used to reduce the dimension of the feature vectors. Then, we adopt the synthetic minority over sampling technique (SMOTE) to balance the samples. Finally, the optimal feature vectors are input into the extreme gradient boosting (XGBoost) classifier to predict the type of Golgi-resident protein. The overall prediction accuracy is 92.1% on training set via jackknife test, which achieves better performance than other state-of-the-art methods. The accuracy of independent testing dataset is 86.5%. And the results show that this paper provides a new method for predicting the type of Golgi-resident protein.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available