☆ 4.8 Article

Machine learning classification can reduce false positives in structure-based virtual screening

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2020)

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

卷 117, 期 31, 页码 18477-18488

出版社

NATL ACAD SCIENCES

DOI: 10.1073/pnas.2000585117

关键词

virtual screening; machine learning classifier; structure-based drug design; protein-ligand complex

类别

Multidisciplinary Sciences

资金

National Science Foundation [ACI-1548562, CHE-1836950]
National Institute of General Medical Sciences [R01GM099959, R01GM112736, R01GM123336]
NIH/NCI Cancer Center Support Grant [P30CA006927]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery's search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC50 better than 50 mu M. Without any medicinal chemistry optimization, the most potent hit has IC50 280 nM, corresponding to K-i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.

Machine learning classification can reduce false positives in structure-based virtual screening

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

出版社

NATL ACAD SCIENCES

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Machine learning classification can reduce false positives in structure-based virtual screening

期刊

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

出版社

NATL ACAD SCIENCES

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文