4.7 Article

Reliability Assessment of Machine Learning Models in Hydrological Predictions Through Metamorphic Testing

Journal

WATER RESOURCES RESEARCH
Volume 57, Issue 9, Pages -

Publisher

AMER GEOPHYSICAL UNION
DOI: 10.1029/2020WR029471

Keywords

metamorphic testing; machine learning; reliability assessment; model testing; hydrological modeling

Funding

  1. Research Grants Council of the Hong Kong Special Administrative Region, China
  2. RGC Theme-based Research Scheme [T21-711/16-R]

Ask authors/readers for more resources

This study introduces a method based on metamorphic testing (MT) to assess the prediction reliability of machine learning models in hydrological studies, where actual outputs are unavailable. The research found that prediction accuracy and consistency were not correlated, and investigated factors such as input similarity to observed data influencing assessment results. Overall, MT is shown to be an effective method for detecting inconsistent model predictions and is recommended for new condition predictions.
The reliability of the machine learning model prediction for a given input can be assessed by comparing it against the actual output. However, in hydrological studies, machine learning models are often adopted to predict future or unknown events, where the actual outputs are unavailable. The prediction accuracy of a model, which measures its average performance across an observed data set, may not be relevant for a specific input. This study presents a method based on metamorphic testing (MT), adopted from software engineering, to assess the prediction reliability where the actual outputs are unknown. In this method, the predictions for a group of related inputs are considered consistent only if the input and output follow certain relations that are deduced from the properties of the system being modeled. For instance, the predicted runoff volume should increase in a rainfall-runoff model as the rainfall magnitude of an input increases. In this study, the MT-based method was applied to assess the predictions made by various machine learning models that were trained to predict the magnitude of flood events in Germany. Surprisingly, the prediction accuracy of a model and its ability to provide consistent predictions were found to be uncorrelated. This study further investigated the factors influencing the assessment result of a given input, such as its similarity to observed data. Overall, this research shows that MT is an effective and simple method for detecting inconsistent model predictions and is recommended when a model is employed to making predictions under new conditions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available