4.7 Article

Analysing the impact of soil spatial sampling on the performances of Digital Soil Mapping models and their evaluation: A numerical experiment on Quantile Random Forest using clay contents obtained from Vis-NIR-SWIR hyperspectral imagery

Journal

GEODERMA
Volume 375, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.geoderma.2020.114503

Keywords

Uncertainty; Sampling methods; Spatial distribution indicators; Quantile Random Forest

Categories

Funding

  1. CNES-TOSCA programme
  2. LE STUDIUM Loire Valley Institute for Advanced Research Studies

Ask authors/readers for more resources

It has long been acknowledged that the soil spatial samplings used as inputs to DSM models are strong drivers and often limiting factors - of the performances of such models. However, few studies have focused on evaluating this impact and identifying the related spatial sampling characteristics. In this study, a numerical experiment was conducted on this topic using the pseudo values of topsoil clay content obtained from an airborne Visible Near InfraRed-Short Wave InfraRed (Vis-NIR-SWIR) hyperspectral image in the Cap Bon region (Tunisia) as the source of the spatial sampling. Twelve thousand DSM models were built by running a Random Forest algorithm from soil spatial sampling of different sizes and average spacings (from 200 m to 2000 m) and different spatial distributions (from clustered to regularly distributed), aiming to mimic the various situations encountered when handling legacy data. These DSM models were evaluated with regard to both their prediction performances and their ability to estimate their overall and local uncertainties. Three evaluation methods were applied: a model-based one, a classical model-free one using 25% of the sites removed from the initial soil data, and a reference one using a set of 100,000 independent sites selected by stratified random sampling over the entire region. The results showed that: 1) While, as expected, the performances of the DSM models increased when the spacing of the sample increased, this increase was diminished for the smallest spacing as soon as 50% of the spatially structured variance was captured by the sampling, 2) Sampling that provided complete and even distributions in the geographical space and had as great spread of the target soil property as possible increased the DSM performances, while complete and even sampling distributions in the covariate space had less impacts, 3) Systematic underestimations of the overall uncertainty of DSM models were observed, that were all the more important that the sparse samplings poorly covered the real distribution of the target soil property and that the dense sampling were unevenly distributed in the geographical space, 4) The local uncertainties were underestimated for sparse sampling and over-estimated for dense sampling while being sensitive to the same sampling characteristics as overall uncertainty. Such finding have practical outcomes on sampling strategies and DSM model evaluation that are discussed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available