4.6 Article

Incorporating Domain Knowledge and Structure-Based Descriptors for Machine Learning: A Case Study of Pd-Catalyzed Sonogashira Reactions

Journal

MOLECULES
Volume 28, Issue 12, Pages -

Publisher

MDPI
DOI: 10.3390/molecules28124730

Keywords

activation energy; homogeneous catalysis; ligand effects; machine learning; phosphine ligands

Ask authors/readers for more resources

In this study, a simple molecular representation method was developed using machine learning and artificial intelligence techniques to predict rate constants of aryl bromides in palladium-catalyzed Sonogashira coupling reactions. The results demonstrate the importance of incorporating domain knowledge into machine learning for improved data analysis.
Machine learning has revolutionized information processing for large datasets across various fields. However, its limited interpretability poses a significant challenge when applied to chemistry. In this study, we developed a set of simple molecular representations to capture the structural information of ligands in palladium-catalyzed Sonogashira coupling reactions of aryl bromides. Drawing inspiration from human understanding of catalytic cycles, we used a graph neural network to extract structural details of the phosphine ligand, a major contributor to the overall activation energy. We combined these simple molecular representations with an electronic descriptor of aryl bromide as inputs for a fully connected neural network unit. The results allowed us to predict rate constants and gain mechanistic insights into the rate-limiting oxidative addition process using a relatively small dataset. This study highlights the importance of incorporating domain knowledge in machine learning and presents an alternative approach to data analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available