4.7 Article

Tuning support vector machines regression models improves prediction accuracy of soil properties in MIR spectroscopy

Journal

GEODERMA
Volume 365, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.geoderma.2020.114227

Keywords

Machine-learning; Kernel; Error-grid; FTIR; RMSE

Categories

Funding

  1. Foundation for Food and Agricultural Research
  2. School of Environment and Natural Resources at Ohio State University

Ask authors/readers for more resources

Estimating soil properties in diffuse reflectance infrared Fourier transform spectroscopy in the mid-infrared region (mid-DRIFTS) uses statistical modeling (chemometrics) to predict soil properties from spectra. Modeling approaches can have major impacts on prediction accuracy. However, the impact of selecting best parameters for an algorithm (tuning), to optimize non-linear models for predicting soil properties, is relatively unexplored in the domain of soil sciences. This study aimed to evaluate the predictive performance of linear (partial least squares, PLS) and non-linear (support vector machines, SVM) multivariate regression models in estimating soil physical, chemical, and biological properties with mid-DRIFTS. We evaluated the impact of optimizing two hyperparameters (epsilon and cost) based on the noise tolerance in the epsilon-insensitive loss function of SVM models using two contrasting and diverse sets of soils, one from northern Tanzania (n = 533) and another one from USA Midwest (n = 400). Regression models were trained on calibration sets (75%) and tested on independent validation sets (25%) separately for each dataset. Support vector machines outperformed PIS models for all tested soil properties (clay, sand, pH, total organic carbon, and permanganate oxidizable carbon) in both datasets. Tuning hyperparameters epsilon and cost maintained or improved prediction accuracy of SVM models based on root mean squared errors of independent validation sets. Support vector machines tuned hyperparameters differed among soil properties and also for the same soil property in distinct datasets, suggesting the need for parameterizing non-linear models for specific soil properties and datasets. Optimizing SVM regression models in mid-DRIFTS improves prediction accuracy of soil properties and therefore will likely enable obtaining more robust predictive outcomes even in datasets with diverse land uses, parent materials, and/or soil orders. We recommend that tuning should be included as a routine step when using SVM for estimating soil properties.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available