4.7 Article

Validation of uncertainty predictions in digital soil mapping

Journal

GEODERMA
Volume 437, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.geoderma.2023.116585

Keywords

Validation; Digital soil mapping; Uncertainty; Machine learning; Proper scoring rules; Quantile regression

Categories

Ask authors/readers for more resources

In digital soil mapping, probabilistic predictions are commonly used but their validation is often overlooked. By adopting metrics from the broader probabilistic literature, the reliability and sharpness of these predictions can be evaluated. In a case study, the probabilistic predictions of five different models were compared, with QRF and QRPP RF performing the best and NM performing the worst.
It is quite common in digital soil mapping (DSM) to quantify the uncertainty of issued predictions, that is to make probabilistic predictions. Yet, little attention has been paid to its validation. Probabilistic predictions are only of value for end users if they are reliable and ideally also sharp. Reliability refers to the consistency between predicted conditional probabilities and observed frequencies of independent test data. Sharpness refers to the concentration of a conditional probability distribution function, i.e. its narrowness. The prediction interval coverage probability (PICP) is currently used in DSM to validate the reliability of prediction intervals but it is ignorant of a potential one-sided bias of its boundaries. Therefore, we propose to extend the current validation procedure with metrics used in the broader probabilistic literature. These metrics not only evaluate probabilistic predictions in prediction interval format but also quantiles or full conditional probability distributions. We suggest the quantile coverage probability (QCP) and probability integral transform (PIT) histogram as alterna-tives to PICP and proper scoring rules for relative comparisons of competing probabilistic models. As scoring rules, we present the interval score (IS) and the continuous ranked probability score (CRPS), which can be decomposed into a reliability part (RELI). We illustrated the use of these metrics in a case study using soil pH and soil organic carbon from the LUCAS-soil database. Thereby, probabilistic predictions of five different models were compared: a reference null model (NM), quantile regression forest (QRF), quantile regression post-processing of a random forest (QRPP RF), kriging with external drift (KED) and quantile regression neural network (QRNN). For KED and QRNN, one-sided bias was found. This was not apparent from PICP but was shown by use of the PIT histogram and QCP. RELI summarized the trends found in QCP, PICP and PIT histograms to one numerical value. CRPS and IS were especially harsh to outliers and low sharpness. According to CRPS and IS, the best probabilistic predictions were obtained by QRF and QRPP RF and the worst by NM.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available