4.4 Article

Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures

Publisher

IOP PUBLISHING LTD
DOI: 10.1088/1361-651X/aaf8ca

Keywords

machine learning; polymers; glass transition temperature

Funding

  1. Toyota Research Institute through the Accelerated Materials Design and Discovery program

Ask authors/readers for more resources

Over the past decade, there has been a resurgence in the importance of data-driven techniques in materials science and engineering. The utilization of state-of-the art algorithms, coupled with the increased availability of experimental and computational data, has led to the development of surrogate models offering the promise of rapid and accurate predictions of materials' properties based solely on their structure or composition. Such machine learning (ML) models are trained on available past data and are thus susceptible to the intrinsic uncertainties/errors associate with these past measurements. The glass transition temperature (T-g) of polymers, a property of paramount interest in polymer science, is one strong example of a material property that can show widespread variation in the final reported value as a result of a variety of intrinsic and extrinsic factors that occur during the experimental measurement process. In the current work, we curate a large database of T-g measurements from a variety of data sources and proceed to investigate the statistical nature of the inherent uncertainties in the database. Through the partitioning of the dataset using statistically relevant measures, we investigate the effect of variations in the dataset on the performance of the final ML model. We demonstrate that the measure of central tendency, median is a valid approximation when dealing with multiple reported values for T-g when dealing with multiple reported values of T-g for the same polymeric material. Moreover, the Bayesian model noise/uncertainty that emerges from our machine-learning pipeline is able to represent quantitatively the underlying noise/uncertainties in the experimental measurement of T-g.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available