☆ 4.7 Article

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions

JOURNAL OF MEDICINAL CHEMISTRY (2022)

期刊

JOURNAL OF MEDICINAL CHEMISTRY

卷 65, 期 11, 页码 7918-7932

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.jmedchem.2c00460

关键词

类别

Chemistry, Medicinal

资金

National Key Research and Development Program of China [2021YFF1201400]
Natural Science Foundation of Zhejiang Province [LZ19H300001, LD22H300001]
Fundamental Research Funds for the Central Universities [2020QNA7003]
Key R&D Program of Zhejiang Province [2020C03010]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The development of accurate machine-learning-based scoring functions for virtual screening requires unbiased and diverse datasets. However, most existing datasets may suffer from hidden biases and data insufficiency. In this study, we developed a new approach named TocoDecoy to generate unbiased and expandable datasets, and evaluated its performance compared to other datasets.

Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUDE dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions

期刊

JOURNAL OF MEDICINAL CHEMISTRY

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

TocoDecoy: A New Approach to Design Unbiased Datasets for Training and Benchmarking Machine-Learning Scoring Functions

期刊

JOURNAL OF MEDICINAL CHEMISTRY

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文