4.7 Article

True Accuracy of Fast Scoring Functions to Predict High-Throughput Screening Data from Docking Poses: The Simpler the Better

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 61, Issue 6, Pages 2788-2797

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.1c00292

Keywords

-

Funding

  1. Doctoral School of Chemical Sciences (EDSC, University of Strasbourg)

Ask authors/readers for more resources

This study conducted an unbiased evaluation of four scoring functions on a high-confidence experimental screening dataset, revealing that rescoring based on simple interaction fingerprints or interaction graphs outperforms advanced machine learning and deep learning scoring functions in most cases. It also highlights the tendency of deep learning methods to predict affinity values within a narrow range centered on the mean value of training samples, and suggests the importance of pre-existing binding modes in detecting the most potent binders.
Hundreds of fast scoring functions have been developed over the last 20 years to predict binding free energies from three-dimensional structures of protein-ligand complexes. Despite numerous statistical promises, we believe that none of them has been properly validated for daily prospective high-throughput virtual screening studies, mostly because in silico screening challenges usually employ artificially built and biased datasets. We here carry out a fully unbiased evaluation of four scoring functions (Pafnucy, Delta vinaRF20, IFP, and GRIM) on an in-house developed data collection of experimental high-confidence screening data (LIT-PCBA) covering about 3 million data points on 15 diverse pharmaceutical targets. All four scoring functions were applied to rescore the docking poses of LIT-PCBA compounds in conditions mimicking exactly standard drug discovery scenarios and were compared in terms of propensity to enrich true binders in the top 1%-ranked hit lists. Interestingly, rescoring based on simple interaction fingerprints or interaction graphs outperforms state-of-the-art machine learning and deep learning scoring functions in most of the cases. The current study notably highlights the strong tendency of deep learning methods to predict affinity values within a very narrow range centered on the mean value of samples used for training. Moreover, it suggests that knowledge of pre-existing binding modes is the key to detecting the most potent binders.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available