4.7 Article

Evaluating the validity of class balancing algorithms-based machine learning models for geogenic contaminated groundwaters prediction

Journal

JOURNAL OF HYDROLOGY
Volume 610, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.jhydrol.2022.127933

Keywords

Machine learning; Groundwater; Arsenic; Weighted cross-entropy; Adaptive synthetic sampling

Funding

  1. National Natural Science Foundation of China [41772255, 41521001, 42020104005]

Ask authors/readers for more resources

Data-driven machine learning models have been used to predict levels of hazardous substances in groundwater. However, class-imbalanced data leads to low sensitivity in these models, despite high overall accuracy. To improve sensitivity, four algorithms were tested. The results showed that all four algorithms produced more accurate predictions with an average increase in sensitivity of 53.8%. ADASYN performed the best, increasing the model's G-means by over 40% on average. Furthermore, ADASYN-optimized models predicted higher groundwater exposure risk in Ghana compared to Ethiopia.
Data-driven machine learning models have been used to predict hazardous substances levels in groundwater. However, class-imbalanced data results in models that may show grossly low sensitivity even though they show high overall accuracy. To address this issue, four algorithms weighted cross-entropy loss, Random over sampling, Random undersampling, and Adaptive synthetic sampling (ADASYN) were tested for their validity in improving model sensitivity. Testing of the above four algorithms using geogenic high arsenic groundwater data from the Datong Basin, the Red River Delta of Vietnam, Bangladesh, Texas and California showed that all four algorithms produced more accurate predictions with an average increase in sensitivity of 53.8% compared to the raw models. The ADASYN is the best of the four algorithms and can increase model G-means (geometric mean of sensitivity and specificity) by >40% on average. The ADASYN-optimized ANN models predicted higher groundwater As exposure risk in Ghana than that in Ethiopia.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available