4.8 Article

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C-O Couplings

期刊

JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
卷 144, 期 32, 页码 14722-14730

出版社

AMER CHEMICAL SOC
DOI: 10.1021/jacs.2c05302

关键词

-

资金

  1. CNRS
  2. French National Association of Research and Technology (ANRT) [2019/0821]

向作者/读者索取更多资源

This study focuses on the application of machine learning in predicting synthetic yields and builds a dataset based on organic reaction publications. The study finds that including optimization data improves prediction accuracy and emphasizes the impact of publication constraints on the exploration of chemical space by the synthetic community.
Synthetic yield prediction using machine learning is intensively studied. Previous work has focused on two categories of data sets: high-throughput experimentation data, as an ideal case study, and data sets extracted from proprietary databases, which are known to have a strong reporting bias toward high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a data set on nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据