4.4 Article

In silico active learning for small molecule properties

Journal

MOLECULAR SYSTEMS DESIGN & ENGINEERING
Volume 7, Issue 12, Pages 1611-1621

Publisher

ROYAL SOC CHEMISTRY
DOI: 10.1039/d2me00137c

Keywords

-

Funding

  1. 3M company

Ask authors/readers for more resources

Machine learning is a promising technology for accelerating materials discovery. By providing a reliable, reproducible, and automated simulation pipeline, users with different backgrounds can easily generate thermodynamic data and further drive chemical exploration through active learning methods.
Machine learning (ML) has emerged as a promising technrftgyhuiilogy to accelerate materials discovery. While systematic screening of vast chemical spaces is computationally expensive, ML algorithms offer a directed approach to identifying and testing promising molecular candidates for specific applications. Two significant hurdles towards development of robust ML models are the quality and quantity of existing experimental and ab initio data for training ML models in new chemical spaces. Here we present a reliable, reproducible, and fully automated simulation pipeline that enables users with varied backgrounds to easily generate thermodynamic data. Our atomistic simulation pipeline is GPU accelerated and suitable for high-performance computing (HPC) environments. We validate our pipeline results against dedicated experimental work and existing literature data, then further demonstrate how ML may be employed via an active learning approach to further drive chemical exploration. First, ML models are trained to predict thermodynamic properties for a large set of small molecule candidates. Second, new molecules are picked and simulated based on the model uncertainty and expected model improvement. These additional simulations enhance the predictive capabilities of the model until we are satisfied with the overall prediction capability. We simulate 410 molecules using active learning within the automated simulation pipeline to enumerate properties of interest. Across our set of over 6000 small molecule candidates, our active learning procedure is able to predict monomer properties at error rates which are substantial improvements compared to a random selection baseline. We demonstrate that this approach is capable of reducing the number of completed simulations while simultaneously generating a reliable final model to predict thermodynamic properties for a large number of small molecules.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available