☆ 4.7 Article

Transfer learning for solvation free energies: From quantum chemistry to experiments

CHEMICAL ENGINEERING JOURNAL (2021)

Journal

CHEMICAL ENGINEERING JOURNAL

Volume 418, Issue -, Pages -

Publisher

ELSEVIER SCIENCE SA

DOI: 10.1016/j.cej.2021.129307

Keywords

Transfer learning; Solvation free energy; COSMO-RS; Quantum chemistry; Aleatoric uncertainty

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this work, a transfer learning approach combining quantum calculations and experimental measurements is proposed for the prediction of solvation free energies, showing significant advantages for new data, small datasets and out-of-sample predictions. The pre-trained models based on quantum calculations demonstrate improved out-of-sample performance, with a mean absolute error of 0.21 kcal/mol achieved on random test splits.

Data scarcity, bias, and experimental noise are all frequently encountered problems in the application of deep learning to chemical and material science disciplines. Transfer learning has proven effective in compensating for the lack in data. The use of quantum calculations in machine learning enables the generation of a diverse dataset and ensures that learning is less affected by noise inherent to experimental databases. In this work, we propose a transfer learning approach for the prediction of solvation free energies that combines fundamentals from quantum calculations with the higher accuracy of experimental measurements using two new databases CombiSolv-QM and CombiSolv-Exp. The employed model architecture is based on the directed-message passing neural network for the molecular embedding of solvent and solute molecules. A significant advantage of models pre-trained on quantum calculations is demonstrated for small experimental datasets and for out-of-sample predictions. The improved out-of-sample performance is shown for new solvents, for new solute elements, and for the extension to higher molar mass solutes. The overall performance of the pre-trained models is limited by the noise in the experimental test data, known as the aleatoric uncertainty. On a random test split, a mean absolute error of 0.21 kcal/mol is achieved. This is a significant improvement compared to the mean absolute error of the quantum calculations (0.40 kcal/mol). The error can be further reduced to 0.09 kcal/mol if the model performance is assessed on a more accurate subset of the experimental data.

Transfer learning for solvation free energies: From quantum chemistry to experiments

Journal

CHEMICAL ENGINEERING JOURNAL

Publisher

ELSEVIER SCIENCE SA

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Transfer learning for solvation free energies: From quantum chemistry to experiments

Journal

CHEMICAL ENGINEERING JOURNAL

Publisher

ELSEVIER SCIENCE SA

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper