4.7 Article

Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

期刊

BIOINFORMATICS
卷 35, 期 14, 页码 2395-2402

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty995

关键词

-

资金

  1. National Nature Science Foundation of China [61863010, 11771188]
  2. Natural Science Foundation of Shandong Province of China [ZR2018MC007]
  3. Project of Shandong Province Higher Educational Science and Technology Program [J17KA159]
  4. National Institute of General Medical Sciences of the National Institutes of Health [1R01GM131399-01]
  5. National Nature Science Foundation of China (NSFC) [61772313, 61432010]
  6. Young Scholars Program of Shandong University (YSPSDU) [2015WLJH19]

向作者/读者索取更多资源

Motivation The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据