4.6 Article

Determining usefulness of machine learning in materials discovery using simulated research landscapes

期刊

PHYSICAL CHEMISTRY CHEMICAL PHYSICS
卷 23, 期 26, 页码 14156-14163

出版社

ROYAL SOC CHEMISTRY
DOI: 10.1039/d1cp01761f

关键词

-

资金

  1. EPSRC
  2. ERC (PoC grant) [403098]

向作者/读者索取更多资源

The article discusses the impact of data acquisition bias on the effectiveness of machine learning in predicting new materials' performance based on existing experimental data. The concept of simulated research landscapes is introduced to quantify the benefits of using specific machine learning models, revealing a window of opportunity for accelerated discovery of new best-performing materials.
When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this context, the following two conditions are highly common: (i) constructing new unbiased data sets is too expensive and the global knowledge effectively does not change by performing a limited number of novel measurements; (ii) the performance of the material depends on a limited number of physical parameters, much smaller than the range of variables that can be changed, albeit such parameters are unknown or not measurable. To determine the usefulness of ML under these conditions, we introduce the concept of simulated research landscapes, which describe how datasets of arbitrary complexity evolve over time. Simulated research landscapes allow us to use different discovery strategies to compare standard materials exploration with ML-guided explorations, i.e. we can measure quantitatively the benefit of using a specific ML model. We show that there is a window of opportunity to obtain a significant benefit from ML-guided strategies. The adoption of ML can take place too soon (not enough information to find patterns) or too late (dense datasets only allow for negligible ML benefit), and the adoption of ML can even slow down the discovery process in some cases. We offer a qualitative guide on when ML can accelerate the discovery of new best-performing materials in a field under specific conditions. The answer in each case depends on factors like data dimensionality, corrugation and data collection strategy. We consider how these factors may affect the ML prediction capabilities and discuss some general trends.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据