4.2 Article

An Optimized Approach for Predicting Water Quality Features Based on Machine Learning

期刊

出版社

WILEY-HINDAWI
DOI: 10.1155/2022/3397972

关键词

-

资金

  1. Universiti Kebangsaan Malaysia [GUP2019-060]
  2. Institute for Information & Communications Technology Planning & Evaluation (IITP) - Korea government (Ministry of Science and ICT, South Korea) [2022-0-01200]
  3. (Training Key Talents in Industrial Convergence Security)

向作者/读者索取更多资源

This study utilizes machine learning classification methods to predict water quality index (WQI) and identifies important features for prediction. The optimized Random Forest classifier with the WQI parameter selected by information gain achieved the highest performance. The study shows that the parameters oxygen (DO) and biochemical oxygen demand (BOD) are important features for predicting WQI. The proposed model has reasonable accuracy and minimal parameters, making it suitable for real-time water quality detection systems.
Traditionally, water quality is assessed using costly laboratory and statistical methods, rendering real-time monitoring useless. Poor water quality requires a more practical and cost-effective solution. The machine learning classification approach appears promising for rapid detection and prediction of water quality. Machine learning has been used successfully to predict water quality. However, research on machine learning for water quality index (WQI) prediction is generally lacking. Therefore, this research aims to identify the important features for the WQI, which necessitated the classification of numerous indicators. This study develops four machine learning models (Artificial Neural Network, Support Vector Machine, Random Forest, and Naive Bayes) based on the WQI and chemical parameters. The Langat Basin in Selangor dataset from the Department of Environment of Malaysia trains and validates each machine learning model. Several data preprocessing tasks such as data cleaning and feature selection have been conducted on the raw dataset to ensure the quality of the training data. The performance of these machine learning algorithms is further rectified based on the selected features set by several feature selection strategies such as information gain, correlation, and symmetrical uncertainty. Each classifier is then optimized using different tuning parameters to achieve optimum values before comparing the output of the three classifiers against each other. The observational results have shown that the optimized Random Forest classifier with the WQI parameter selected by the information gain feature selection method achieved the highest performance. The experimental results show that the WQI parameters are more relevant in predicting the WQI than the other variables. Consequently, this result shows that parameter oxygen (DO) and biochemical oxygen demand (BOD) are important features for predicting WQI. The proposed model achieved reasonable accuracy with minimal parameters, indicating that it could be used in real-time water quality detection systems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据