4.7 Article

Reducing false positive rate of docking-based virtual screening by active learning

期刊

BRIEFINGS IN BIOINFORMATICS
卷 24, 期 1, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac626

关键词

molecular docking; machine learning-based scoring function (MLSF); active learning; virtual screening (VS); false positive

向作者/读者索取更多资源

Machine learning-based scoring functions (MLSFs) have gained popularity due to their potential superior screening performance compared to classical scoring functions. However, little is known about the information of negative data used in constructing MLSFs, and existing databases often contain biased putative inactive molecules. In this study, we propose an easy-to-use method called AMLSF that combines active learning and MLSF to improve the quality of inactive sets and reduce false positive rate. Our results demonstrate that AMLSF outperforms the control models in terms of identifying active molecules and reducing false positives.
Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据