☆ 4.7 Article

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

BRIEFINGS IN BIOINFORMATICS (2021)

期刊

BRIEFINGS IN BIOINFORMATICS

卷 22, 期 1, 页码 497-514

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbz173

关键词

scoring function (SF); machine learning (ML); scoring power; binding affinity; ML-based SF

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

Key R&D Program of Zhejiang Province [2020C03010]
National Natural Science Foundation of China [21575128, 81773632]
Zhejiang Provincial Natural Science Foundation of China [LZ19H300001]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Research has shown that machine learning-based scoring functions outperform classical scoring functions in predicting protein-ligand binding affinity. Gradient boosting decision tree and random forest achieved the best predictions in most cases. The superiority of machine learning-based scoring functions is fully guaranteed when the training set contains sufficient similar targets.

How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文