4.7 Article

Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis

期刊

COMPUTERS IN BIOLOGY AND MEDICINE
卷 134, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2021.104516

关键词

Protein-protein interaction sites; Feature extraction; SMOTE; KPCA; XGBoost

资金

  1. National Natural Science Foundation of China [61863010]
  2. Key Research and Development Program of Shandong Province of China [2019GGX101001]
  3. Key Laboratory Open Foundation of Hainan Province [JSKX202001]

向作者/读者索取更多资源

The PPISP-XGBoost method proposed in the study uses XGBoost to predict PPI sites by extracting and optimizing features, achieving higher accuracy compared to existing methods on multiple datasets. The results demonstrate the effectiveness of PPISP-XGBoost in enhancing the prediction of PPI sites.
Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and timeconsuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISPXGBoost method can further enhance the prediction of PPI sites.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据