4.8 Review

Science-Driven Atomistic Machine Learning

Journal

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION
Volume 62, Issue 26, Pages -

Publisher

WILEY-V C H VERLAG GMBH
DOI: 10.1002/anie.202219170

Keywords

Artificial Intelligence; Atomistic Simulations; Chemical Data; Machine Learning; Molecular Dynamics

Ask authors/readers for more resources

Machine learning algorithms are powerful tools in science, but large well-curated databases are sparse in chemistry. This contribution reviews science-driven machine learning approaches that focus on atomistic modelling of materials and molecules. Science-driven machine learning involves starting with a scientific question and determining appropriate training data and model design choices. It emphasizes automated and purpose-driven data collection, the use of chemical and physical priors for efficient data usage, and the importance of appropriate model evaluation and error estimation.
Machine learning (ML) algorithms are currently emerging as powerful tools in all areas of science. Conventionally, ML is understood as a fundamentally data-driven endeavour. Unfortunately, large well-curated databases are sparse in chemistry. In this contribution, I therefore review science-driven ML approaches which do not rely on big data, focusing on the atomistic modelling of materials and molecules. In this context, the term science-driven refers to approaches that begin with a scientific question and then ask what training data and model design choices are appropriate. As key features of science-driven ML, the automated and purpose-driven collection of data and the use of chemical and physical priors to achieve high data-efficiency are discussed. Furthermore, the importance of appropriate model evaluation and error estimation is emphasized.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available