☆ 4.7 Article

Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning

JOURNAL OF CHEMICAL PHYSICS (2021)

期刊

JOURNAL OF CHEMICAL PHYSICS

卷 154, 期 12, 页码 -

出版社

AIP Publishing

DOI: 10.1063/5.0035530

关键词

类别

Chemistry, Physical Physics, Atomic, Molecular & Chemical

资金

Luxembourg National Research (FNR) under the AFR Project [14593813, FNR C19/MS/13718694/QML-FLEX]
FNR DTU-PRIDE MASSENA
European Research Council (ERC-CoG Grant) [BeStMo]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a combination of unsupervised and supervised machine learning methods was used to bypass the bias of data for common configurations, which resulted in a significant decrease in prediction errors for force on non-equilibrium geometries. The approach also demonstrated improved stability over default training methods, allowing reliable study of processes involving highly out-of-equilibrium molecular configurations. These findings were consistent across different types of machine learning models used in the study.

The training set of atomic configurations is key to the performance of any Machine Learning Force Field (MLFF) and, as such, the training set selection determines the applicability of the MLFF model for predictive molecular simulations. However, most atomistic reference datasets are inhomogeneously distributed across configurational space (CS), and thus, choosing the training set randomly or according to the probability distribution of the data leads to models whose accuracy is mainly defined by the most common close-to-equilibrium configurations in the reference data. In this work, we combine unsupervised and supervised ML methods to bypass the inherent bias of the data for common configurations, effectively widening the applicability range of the MLFF to the fullest capabilities of the dataset. To achieve this goal, we first cluster the CS into subregions similar in terms of geometry and energetics. We iteratively test a given MLFF performance on each subregion and fill the training set of the model with the representatives of the most inaccurate parts of the CS. The proposed approach has been applied to a set of small organic molecules and alanine tetrapeptide, demonstrating an up to twofold decrease in the root mean squared errors for force predictions on non-equilibrium geometries of these molecules. Furthermore, our ML models demonstrate superior stability over the default training approaches, allowing reliable study of processes involving highly out-of-equilibrium molecular configurations. These results hold for both kernel-based methods (sGDML and GAP/SOAP models) and deep neural networks (SchNet model).

Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning

期刊

JOURNAL OF CHEMICAL PHYSICS

出版社

AIP Publishing

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving molecular force fields across configurational space by combining supervised and unsupervised machine learning

期刊

JOURNAL OF CHEMICAL PHYSICS

出版社

AIP Publishing

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文