☆ 4.7 Review

The impact of compound library size on the performance of scoring functions for structure-based virtual screening

BRIEFINGS IN BIOINFORMATICS (2021)

Journal

BRIEFINGS IN BIOINFORMATICS

Volume 22, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbaa095

Keywords

virtual screening; big data; machine learning; drug design; docking

Funding

ANR Tremplin-ERC grant [ANR-17-ERC2-0003-01]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Increasing the size of training datasets improves the accuracy of machine learning-based scoring functions in structure-based virtual screening, and using massive test sets can lead to fast discovery of drug leads with low-nanomolar potency. Screening larger compound libraries results in the identification of more potent actives, and ranking molecules with more accurate machine learning-based scoring functions can further enhance their potency. Additionally, classical and machine learning-based scoring functions often find different actives, suggesting the benefit of using both types of scoring functions on multiple targets.

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.

The impact of compound library size on the performance of scoring functions for structure-based virtual screening

Journal

BRIEFINGS IN BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

The impact of compound library size on the performance of scoring functions for structure-based virtual screening

Journal

BRIEFINGS IN BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper