4.7 Review

The impact of compound library size on the performance of scoring functions for structure-based virtual screening

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa095

Keywords

virtual screening; big data; machine learning; drug design; docking

Funding

  1. ANR Tremplin-ERC grant [ANR-17-ERC2-0003-01]

Ask authors/readers for more resources

Increasing the size of training datasets improves the accuracy of machine learning-based scoring functions in structure-based virtual screening, and using massive test sets can lead to fast discovery of drug leads with low-nanomolar potency. Screening larger compound libraries results in the identification of more potent actives, and ranking molecules with more accurate machine learning-based scoring functions can further enhance their potency. Additionally, classical and machine learning-based scoring functions often find different actives, suggesting the benefit of using both types of scoring functions on multiple targets.
Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available