Journal
JOURNAL OF HYDROLOGY
Volume 610, Issue -, Pages -Publisher
ELSEVIER
DOI: 10.1016/j.jhydrol.2022.127933
Keywords
Machine learning; Groundwater; Arsenic; Weighted cross-entropy; Adaptive synthetic sampling
Funding
- National Natural Science Foundation of China [41772255, 41521001, 42020104005]
Ask authors/readers for more resources
Data-driven machine learning models have been used to predict levels of hazardous substances in groundwater. However, class-imbalanced data leads to low sensitivity in these models, despite high overall accuracy. To improve sensitivity, four algorithms were tested. The results showed that all four algorithms produced more accurate predictions with an average increase in sensitivity of 53.8%. ADASYN performed the best, increasing the model's G-means by over 40% on average. Furthermore, ADASYN-optimized models predicted higher groundwater exposure risk in Ghana compared to Ethiopia.
Data-driven machine learning models have been used to predict hazardous substances levels in groundwater. However, class-imbalanced data results in models that may show grossly low sensitivity even though they show high overall accuracy. To address this issue, four algorithms weighted cross-entropy loss, Random over sampling, Random undersampling, and Adaptive synthetic sampling (ADASYN) were tested for their validity in improving model sensitivity. Testing of the above four algorithms using geogenic high arsenic groundwater data from the Datong Basin, the Red River Delta of Vietnam, Bangladesh, Texas and California showed that all four algorithms produced more accurate predictions with an average increase in sensitivity of 53.8% compared to the raw models. The ADASYN is the best of the four algorithms and can increase model G-means (geometric mean of sensitivity and specificity) by >40% on average. The ADASYN-optimized ANN models predicted higher groundwater As exposure risk in Ghana than that in Ethiopia.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available