4.6 Article

Machine Learning Models to Predict Protein-Protein Interaction Inhibitors

Journal

MOLECULES
Volume 27, Issue 22, Pages -

Publisher

MDPI
DOI: 10.3390/molecules27227986

Keywords

chemoinformatics; computer-aided drug design; drug discovery; machine learning; protein-protein interaction

Funding

  1. Direccion General de Computo y de Tecnologias de Informacion y Comunicacion (DGTIC), UNAM [LANCAD-UNAM-DGTIC-335]
  2. School of Chemistry, Universidad Nacional Autonoma de Mexico [5000-9163]

Ask authors/readers for more resources

This study develops a classification model to identify protein-protein interaction inhibitors using machine learning algorithms and chemoinformatics techniques, and provides free code. The results show that different algorithms and molecular fingerprints have varying performances in the training process, with random forest models trained using extended connectivity fingerprint radius 2 having the best classification abilities.
Protein-protein interaction (PPI) inhibitors have an increasing role in drug discovery. It is hypothesized that machine learning (ML) algorithms can classify or identify PPI inhibitors. This work describes the performance of different algorithms and molecular fingerprints used in chemoinformatics to develop a classification model to identify PPI inhibitors making the codes freely available to the community, particularly the medicinal chemistry research groups working with PPI inhibitors. We found that classification algorithms have different performances according to various features employed in the training process. Random forest (RF) models with the extended connectivity fingerprint radius 2 (ECFP4) had the best classification abilities compared to those models trained with ECFP6 o MACCS keys (166-bits). In general, logistic regression (LR) models had lower performance metrics than RF models, but ECFP4 was the representation most appropriate for LR. ECFP4 also generated models with high-performance metrics with support vector machines (SVM). We also constructed ensemble models based on the top-performing models. As part of this work and to help non-computational experts, we developed a pipeline code freely available.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available