4.2 Article

An Optimized Approach for Predicting Water Quality Features Based on Machine Learning

Journal

WIRELESS COMMUNICATIONS & MOBILE COMPUTING
Volume 2022, Issue -, Pages -

Publisher

WILEY-HINDAWI
DOI: 10.1155/2022/3397972

Keywords

-

Funding

  1. Universiti Kebangsaan Malaysia [GUP2019-060]
  2. Institute for Information & Communications Technology Planning & Evaluation (IITP) - Korea government (Ministry of Science and ICT, South Korea) [2022-0-01200]
  3. (Training Key Talents in Industrial Convergence Security)

Ask authors/readers for more resources

This study utilizes machine learning classification methods to predict water quality index (WQI) and identifies important features for prediction. The optimized Random Forest classifier with the WQI parameter selected by information gain achieved the highest performance. The study shows that the parameters oxygen (DO) and biochemical oxygen demand (BOD) are important features for predicting WQI. The proposed model has reasonable accuracy and minimal parameters, making it suitable for real-time water quality detection systems.
Traditionally, water quality is assessed using costly laboratory and statistical methods, rendering real-time monitoring useless. Poor water quality requires a more practical and cost-effective solution. The machine learning classification approach appears promising for rapid detection and prediction of water quality. Machine learning has been used successfully to predict water quality. However, research on machine learning for water quality index (WQI) prediction is generally lacking. Therefore, this research aims to identify the important features for the WQI, which necessitated the classification of numerous indicators. This study develops four machine learning models (Artificial Neural Network, Support Vector Machine, Random Forest, and Naive Bayes) based on the WQI and chemical parameters. The Langat Basin in Selangor dataset from the Department of Environment of Malaysia trains and validates each machine learning model. Several data preprocessing tasks such as data cleaning and feature selection have been conducted on the raw dataset to ensure the quality of the training data. The performance of these machine learning algorithms is further rectified based on the selected features set by several feature selection strategies such as information gain, correlation, and symmetrical uncertainty. Each classifier is then optimized using different tuning parameters to achieve optimum values before comparing the output of the three classifiers against each other. The observational results have shown that the optimized Random Forest classifier with the WQI parameter selected by the information gain feature selection method achieved the highest performance. The experimental results show that the WQI parameters are more relevant in predicting the WQI than the other variables. Consequently, this result shows that parameter oxygen (DO) and biochemical oxygen demand (BOD) are important features for predicting WQI. The proposed model achieved reasonable accuracy with minimal parameters, indicating that it could be used in real-time water quality detection systems.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available