4.7 Article

Sampling optimal calibration sets in soil infrared spectroscopy

Journal

GEODERMA
Volume 226, Issue -, Pages 140-150

Publisher

ELSEVIER
DOI: 10.1016/j.geoderma.2014.02.002

Keywords

Soil sensing; Sample size; Kennard-Stone sampling; Latin hypercube sampling; Fuzzy c-means

Categories

Funding

  1. State of Sao Paulo Research Foundation (FAPESP) [07/58656-8]

Ask authors/readers for more resources

We investigated the effect of both the calibration set size (number of samples) and the calibration sampling strategy on the performance of vis-NIR models to predict clay content and exchangeable Ca (Ca++). We evaluated the following calibration sampling algorithms: Kenard-Stone (KSS), conditioned Latin hypercube (cLHS) and fuzzy c-means (FCMS), which are commonly used in spectroscopy and digital soil mapping. These algorithms were tested separately using a field-scale dataset and a regional scale dataset. For each dataset we randomly selected a validation subset and the remaining samples were used as candidates for calibration sampling. The accuracy of vis-NIR models of clay content and Ca++ were compared on the basis of the sampling algorithms used for selecting the calibration samples. We also tested 38 different calibration set sizes varying from 10 to 380 samples. The vis-NIR models were calibrated by using the support vector regression machine (SVM) algorithm. The training root mean square error (RMSE), the normalized RMSE and the prediction RMSE were used to evaluate the sensitivity of the models to both the sampling algorithm and the calibration set size. In addition, we investigated the sample representativeness of each algorithm and we suggest a novel and simple methodology to identify an adequate calibration set size based only on the vis-NIR data (i.e. without prior knowledge of the response variables). As expected, our results show that the error of the soil vis-NIR models depends on the calibration set size. When the number of calibration samples is relatively small the sampling algorithm may play an important role on the accuracy of the vis-NIR models. On the other hand, if the calibration set size is large enough, the sampling method is not a critical issue. Concerning the sample representativeness, we found for all the algorithms that the original distribution of the vis-NIR data can be better replicated by increasing the calibration set size. The results indicate that the calibration samples selected by the cLHS and by the FCMS algorithms better replicate the original vis-NIR distribution of all the samples, in comparison to those samples selected by the KSS algorithm. (C) 2014 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available