4.6 Article

Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort

期刊

WATER
卷 13, 期 18, 页码 -

出版社

MDPI
DOI: 10.3390/w13182457

关键词

water quality prediction; machine learning; Escherichia coli concentration; optimized sampling; river monitoring

向作者/读者索取更多资源

The study proposed a framework for optimizing model selection and enriching the training dataset, which was successfully applied to predict Escherichia coli concentrations in the Marne River. The Random Forest model showed the best accuracy among six machine learning-based models, but some E. coli densities were inaccurately estimated, suggesting the need to monitor key variables such as temperature, conductivity, rainfall, and river flow.
Exposure to contaminated water during aquatic recreational activities can lead to gastrointestinal diseases. In order to decrease the exposure risk, the fecal indicator bacteria Escherichia coli is routinely monitored, which is time-consuming, labor-intensive, and costly. To assist the stakeholders in the daily management of bathing sites, models have been developed to predict the microbiological quality. However, model performances are highly dependent on the quality of the input data which are usually scarce. In our study, we proposed a conceptual framework for optimizing the selection of the most adapted model, and to enrich the training dataset. This frameword was successfully applied to the prediction of Escherichia coli concentrations in the Marne River (Paris Area, France). We compared the performance of six machine learning (ML)-based models: K-nearest neighbors, Decision Tree, Support Vector Machines, Bagging, Random Forest, and Adaptive boosting. Based on several statistical metrics, the Random Forest model presented the best accuracy compared to the other models. However, 53.2 +/- 3.5% of the predicted E. coli densities were inaccurately estimated according to the mean absolute percentage error (MAPE). Four parameters (temperature, conductivity, 24 h cumulative rainfall of the previous day the sampling, and the river flow) were identified as key variables to be monitored for optimization of the ML model. The set of values to be optimized will feed an alert system for monitoring the microbiological quality of the water through combined strategy of in situ manual sampling and the deployment of a network of sensors. Based on these results, we propose a guideline for ML model selection and sampling optimization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据