4.7 Article

Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 62, Issue 22, Pages 5329-5341

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.1c00096

Keywords

-

Funding

  1. Engineering and Physical Sciences Research Council (EPSRC)
  2. [EP/G03706 X/1]
  3. [EP/S024093/1]
  4. [EP/L016044/1]

Ask authors/readers for more resources

Machine learning scoring functions for protein-ligand binding affinity perform better on crystal structures than on docked poses, but a hybrid scoring function combining structure-based and ligand-based features shows comparable performance on docked poses to purely structure-based scoring functions trained on crystal poses. However, the hybrid scoring function may not always generalize well to protein targets not represented in the training set, indicating the need for improved scoring functions and additional validation benchmarks.
Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes. We explore how the use of docked rather than crystallographic poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. We also present a new, freely available validation set -the Updated DUD-E Diverse Subset -for binding affinity prediction using data from DUD-E and ChEMBL. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function sometimes generalizes poorly to a protein target not represented in the training set, demonstrating the need for improved scoring functions and additional validation benchmarks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available