4.7 Article

Soil organic carbon stock prediction using multi-spatial resolutions of environmental variables: How well does the prediction match local references?

Journal

CATENA
Volume 229, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.catena.2023.107197

Keywords

Soil Carbon; gSSURGO; SoilGrids; Machine learning; Scaling impact; Earth observation; Digital soil mapping; Auxiliary data

Ask authors/readers for more resources

This study evaluated the impact of spatial resolution of environmental variables on the prediction of soil organic carbon stocks. The random forest algorithm outperformed other algorithms in estimating soil organic carbon stocks at different spatial resolutions. Furthermore, soil maps and geology/landform maps had a significant influence on the predictive results. The results of this study highlight the importance of data sources, model type, and the combination of environmental variables in predicting soil organic carbon stocks.
We evaluated how the spatial resolution of environmental variables (n = 47) altered their ability to predict soil organic carbon (SOC) stocks (0-30 cm depth) using training data from Gridded Soil Survey GeographicgSSURGO and SoilGrids databases. Training and validation subsamples (1,629) were selected using a conditioned Latin hypercube sampling (cLHS) design based on environmental variables in Vermont, U.S. The predictive relationships between environmental variables and SOC stock (t C ha-1) were developed using machine learning algorithms. The algorithms were trained (70 %) and evaluated (30 %) using a random subset of database subsamples, respectively, with an additional evaluation step using local, independent SOC reference data (n = 272). The Random Forest (RF) algorithm outperformed other algorithms at all spatial resolutions in estimating SOC stocks. As spatial resolution increased, model performance with the gSSURGO database increased (R2 = 0.33-0.62 and RMSE = 42.42-34.92), while no such trend was observed for the SoilGrids database. The best SOC stock model prediction using the SoilGrids database was achieved with a 10 m resolution (R2 = 0.54 and RMSE = 4.67). Evaluation of modeled results using the external, or independent, reference data showed a significant decrease compared to the internal validation in prediction accuracy (R2 = 0.11-0.14 for gSSURGO and, R2 = -0.19 for SoilGrids). The gSSURGO database showed that soil maps (including suborders, drainage classes, temperature, and moisture) and geology/landform maps had a greater influence than other environmental variables at all spatial resolution scales. In contrast, climatic- and DEM-related variables were more significant for the SoilGrids database. Our study suggested that the origin of the SOC stock database and the sampling scheme largely affects the importance of environmental variables assigned in the machine learning algorithm. Our results confirmed that the variable and data sources, model type, and combination of environmental variables significantly influenced prediction accuracy. In conclusion, DSM products should be re-evaluated with local references when used for spatial extents that are different from those for which they were initially designed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available