4.7 Article

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 23, 页码 5938-5951

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c01073

关键词

-

资金

  1. Irene Curie Fellowship
  2. Centre for Living Technologies

向作者/读者索取更多资源

Machine learning plays a crucial role in drug discovery and chemistry. However, the effect of activity cliffs - molecules that are structurally similar but exhibit significant differences in potency - on model performance has received limited attention. In this study, we benchmarked 24 machine and deep learning approaches and found that machine learning methods based on molecular descriptors outperformed more complex deep learning methods in predicting the properties of activity cliffs. Our findings highlight the need for dedicated metrics and novel algorithms to address the limitation posed by activity cliffs in molecular machine learning models.
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs-pairs of molecules that are highly similar in their structure but exhibit large differences in potency-have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated activity-cliff-centered metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据