☆ 4.7 Article

Leveraging uncertainty estimates and derivative information in Gaussian process regression for efficient collection and use of molecular simulation data

JOURNAL OF CHEMICAL PHYSICS (2023)

Journal

JOURNAL OF CHEMICAL PHYSICS

Volume 158, Issue 16, Pages -

Publisher

AIP Publishing

DOI: 10.1063/5.0148488

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

We introduce Gaussian Process Regression (GPR) as an enhanced method of thermodynamic extrapolation and interpolation. The heteroscedastic GPR models automatically weight provided information by its estimated uncertainty, allowing for the incorporation of highly uncertain derivative information. We apply GPR models to various data sources and assess active learning strategies, and finally apply it to tracing vapor-liquid equilibrium for a single-component Lennard-Jones fluid.

We introduce Gaussian Process Regression (GPR) as an enhanced method of thermodynamic extrapolation and interpolation. The heteroscedastic GPR models that we introduce automatically weight provided information by its estimated uncertainty, allowing for the incorporation of highly uncertain, high-order derivative information. By the linearity of the derivative operator, GPR models naturally handle derivative information and, with appropriate likelihood models that incorporate heterogeneous uncertainties, are able to identify estimates of functions for which the provided observations and derivatives are inconsistent due to the sampling bias that is common in molecular simulations. Since we utilize kernels that form complete bases on the function space to be learned, the estimated uncertainty in the model takes into account that of the functional form itself, in contrast to polynomial interpolation, which explicitly assumes the functional form to be fixed. We apply GPR models to a variety of data sources and assess various active learning strategies, identifying when specific options will be most useful. Our active-learning data collection based on GPR models incorporating derivative information is finally applied to tracing vapor-liquid equilibrium for a single-component Lennard-Jones fluid, which we show represents a powerful generalization to previous extrapolation strategies and Gibbs-Duhem integration. A suite of tools implementing these methods is provided at https://github.com/usnistgov/thermo-extrap.

Leveraging uncertainty estimates and derivative information in Gaussian process regression for efficient collection and use of molecular simulation data

Journal

JOURNAL OF CHEMICAL PHYSICS

Publisher

AIP Publishing

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Leveraging uncertainty estimates and derivative information in Gaussian process regression for efficient collection and use of molecular simulation data

Journal

JOURNAL OF CHEMICAL PHYSICS

Publisher

AIP Publishing

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper