☆ 4.6 Article

Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework

CURRENT BIOINFORMATICS (2021)

Journal

CURRENT BIOINFORMATICS

Volume 16, Issue 6, Pages 865-879

Publisher

BENTHAM SCIENCE PUBL LTD

DOI: 10.2174/1574893616666210204145254

Keywords

Protein-protein interaction; interaction prediction; protein sequence; sequence encoding; machine learning ap-proach; random forest; Arabidopsis thaliana

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study developed an effective PPI prediction model for Arabidopsis thaliana using partial training samples in a machine learning framework, combining random forest classifier and autocorrelation sequence encoding features. The proposed model showed higher performance scores in both training and test datasets compared to other candidate predictors, suggesting its potential utility in elucidating the biological function of unseen PPIs in Arabidopsis thaliana.

Background: Protein-protein interactions (PPI) play a vital role in a wide range of biological processes starting from cell-cell interactions to developmental control in all organisms. However, experimental identification of PPI is often laborious, time-consuming and costly compared to computational prediction. There are several computational prediction models in the literature based on complete training samples, but none of them dealt with the partial training samples. Objective: The objective of this work was to develop an effective PPI prediction model for Arabidopsis Thaliana using partial training samples in a machine learning framework. Methods: We proposed an effective computational PPI prediction model by combining random forest (RF) classifier and autocorrelation (AC) sequence encoding features with 1:2 ratio of positive-PPI and unknown-PPI samples. Results: We observed that the proposed prediction model produces the highest average performance scores of sensitivity (94.62%), AUC (0.92) and pAUC (0.189) with the training datasets and sensitivity (88.14%), AUC (0.89) and pAUC (0.176) with the test datasets of 5-fold crossvalidation compared to other candidate predictors based on LDA, LOGI, ADA, NB, KNN & SVM classifiers. It also computed the highest performance scores of TPR (91.82%) and pAUC (0.174) at FPR= 20% with AUC (0.948) compared to other candidate predictors. Conclusion: Overall performance of the developed model revealed that our proposed predictor might be useful to elucidate the biological function of unseen PPIs from a large number of candidate proteins in Arabidopsis thaliana.

Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework

Journal

CURRENT BIOINFORMATICS

Publisher

BENTHAM SCIENCE PUBL LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework

Journal

CURRENT BIOINFORMATICS

Publisher

BENTHAM SCIENCE PUBL LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper