4.6 Article

Spatial or Random Cross-Validation? The Effect of Resampling Methods in Predicting Groundwater Salinity with Machine Learning in Mediterranean Region

Journal

WATER
Volume 15, Issue 12, Pages -

Publisher

MDPI
DOI: 10.3390/w15122278

Keywords

cross-validation; spatial mapping; machine learning; spatial autocorrelation; groundwater salinity

Ask authors/readers for more resources

Machine learning algorithms are widely used for their high prediction accuracy, but they may produce overly optimistic results due to overfitting and inadvertent biases. Spatial data, with their intrinsic spatial autocorrelation, can introduce biases to machine learning. Spatial cross-validation (SCV) has emerged as a special resampling method to address this issue. This study compared the performance of SCV with conventional random cross-validation (CCV) in predicting groundwater electrical conductivity (EC) using different datasets. The results showed that SCV provides ML models with better generalization capabilities and reduces the over-optimism bias associated with CCV methods. SCV could be applied in studies that use spatial data and machine learning.
Machine learning (ML) algorithms are extensively used with outstanding prediction accuracy. However, in some cases, their overfitting capabilities, along with inadvertent biases, might produce overly optimistic results. Spatial data are a special kind of data that could introduce biases to ML due to their intrinsic spatial autocorrelation. To address this issue, a special resampling method has emerged called spatial cross-validation (SCV). The purpose of this study was to evaluate the performance of SCV compared with conventional random cross-validation (CCV) used in most ML studies. Multiple ML models were created with CCV and SCV to predict groundwater electrical conductivity (EC) with data (A) from Rhodope, Greece, in the summer of 2020; (B) from the same area but at a different time (summer 2019); and (C) from a new area (the Salento peninsula, Italy). The results showed that the SCV provides ML models with superior generalization capabilities and, hence, better prediction results in new unknown data. The SCV seems to be able to capture the spatial patterns in the data while also reducing the over-optimism bias that is often associated with CCV methods. Based on the results, SCV could be applied with ML in studies that use spatial data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available