4.7 Article

Comparison and interpretation of data-driven models for simulating site-specific human-impacted groundwater dynamics in the North China Plain

Journal

JOURNAL OF HYDROLOGY
Volume 616, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.jhydrol.2022.128751

Keywords

Groundwater dynamics; Data-driven models; Machine learning; LSTM; North China Plain

Ask authors/readers for more resources

Data-driven models (DDMs) are increasingly popular in groundwater hydrology due to advances in machine learning algorithms and readily accessible data. This study compares deep learning (DL) algorithms to traditional tree-based machine learning (TB) algorithms for groundwater level simulation and investigates the importance of different input features. The results show that deep learning-based DDMs outperform tree-based models in terms of correlation with observed data and the encoder-decoder-LSTM model performs the best. Factors related to human activities are found to have a stronger impact on groundwater level variation. Preprocessing of driving factors is crucial for achieving satisfactory simulations in data-scarce areas.
Data-driven models (DDMs) have gained increasing popularity in groundwater hydrology in recent years due to the advancement of machine learning algorithms and the flexibility of easily accessible data. For groundwater purposes, the differences in deep learning (DL) algorithms compared with traditional tree-based machine learning (TB) algorithms have not been fully investigated, and the importance of different input features for groundwater level simulation has rarely been addressed. In this study, we test and validate six DDMs for simulating the groundwater levels of the North China Plain (NCP) at selected boreholes. The NCP is a large alluvial aquifer system (144,000 km2) overexploited by massive water withdrawals since the 1960s. In our simulations, four DDMs were tree-based (random forest, XGBoost, gradient boosting regression, LightGBM), and two were deep learning algorithms (Vanilla-LSTM and encoder-decoder-LSTM). The results showed that deep -learning-based DDMs provided a better correlation to observed data than tree-based models. Additionally, encoder-decoder-LSTM had the best model performance among all DDMs, and it had the ability to generate compelling results (R2 = 0.61, RMSE = 0.73 m), although each individual driving factor had a low correlation to the simulation target. GINI coefficient analysis and permutation feature importance analysis were used to determine the ranking of different model driving factors for the interpretable results. The results showed that the factors related to human activities had a much stronger impact on groundwater level variation than other factors. A preprocessing procedure of the driving factors helps produce satisfactory simulations aimed at sustainable water management and aquifer restoration, especially in data-scarce areas.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available