4.6 Article

Spatial or Random Cross-Validation? The Effect of Resampling Methods in Predicting Groundwater Salinity with Machine Learning in Mediterranean Region

期刊

WATER
卷 15, 期 12, 页码 -

出版社

MDPI
DOI: 10.3390/w15122278

关键词

cross-validation; spatial mapping; machine learning; spatial autocorrelation; groundwater salinity

向作者/读者索取更多资源

Machine learning algorithms are widely used for their high prediction accuracy, but they may produce overly optimistic results due to overfitting and inadvertent biases. Spatial data, with their intrinsic spatial autocorrelation, can introduce biases to machine learning. Spatial cross-validation (SCV) has emerged as a special resampling method to address this issue. This study compared the performance of SCV with conventional random cross-validation (CCV) in predicting groundwater electrical conductivity (EC) using different datasets. The results showed that SCV provides ML models with better generalization capabilities and reduces the over-optimism bias associated with CCV methods. SCV could be applied in studies that use spatial data and machine learning.
Machine learning (ML) algorithms are extensively used with outstanding prediction accuracy. However, in some cases, their overfitting capabilities, along with inadvertent biases, might produce overly optimistic results. Spatial data are a special kind of data that could introduce biases to ML due to their intrinsic spatial autocorrelation. To address this issue, a special resampling method has emerged called spatial cross-validation (SCV). The purpose of this study was to evaluate the performance of SCV compared with conventional random cross-validation (CCV) used in most ML studies. Multiple ML models were created with CCV and SCV to predict groundwater electrical conductivity (EC) with data (A) from Rhodope, Greece, in the summer of 2020; (B) from the same area but at a different time (summer 2019); and (C) from a new area (the Salento peninsula, Italy). The results showed that the SCV provides ML models with superior generalization capabilities and, hence, better prediction results in new unknown data. The SCV seems to be able to capture the spatial patterns in the data while also reducing the over-optimism bias that is often associated with CCV methods. Based on the results, SCV could be applied with ML in studies that use spatial data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据