4.7 Article

Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance

Journal

JOURNAL OF WATER PROCESS ENGINEERING
Volume 41, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.jwpe.2021.102033

Keywords

Machine learning; ANN; RF; GBM; Feature selection; Total nitrogen

Ask authors/readers for more resources

This study evaluated the effect of seven different Feature Selection methods on enhancing the prediction accuracy for total nitrogen in wastewater treatment plants. The results showed that scenario IV suggested by Mutual Information had the best performance. In addition, Gradient Boosting Machine demonstrated the best performance on unseen data-set, indicating its effectiveness for wastewater components prediction.
Wastewater characteristics prediction in wastewater treatment plants (WWTPs) is valuable and can reduce the number of sampling, energy, and cost. Feature Selection (FS) methods are used in the pre-processing section for enhancing the model performance. This study aims to evaluate the effect of seven different FS methods (filter, wrapper, and embedded methods) on enhancing the prediction accuracy for total nitrogen (TN) in the WWTP influent flow. Four scenarios based on FS suggestions were defined and compared by three supervised Machine Learning (ML) algorithms, i.e. Artificial Neural Network (ANN), Random Forest (RF), and Gradient Boosting Machine (GBM). Input parameters, as daily time-series including pH, DO, COD, BOD, MLSS, MLVSS, NH4-N, and TN concentration, were used. Data set divided into train and unseen test data-sets, and performance precision of all models was carried out based on Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and correlation coefficient (R2). Results reveal that scenario IV which was suggested by Mutual Information, including NH4-N, COD, BOD, and DO had the best result rather than other FS methods. Furthermore, decision tree algorithms (RF and GBM) revealed better performance results in comparison to neural network algorithm (ANN). GBM generalized the dataset patterns very well and produced the best performance on unseen data-set, which shows the effectiveness of this state-of-the-art ML algorithm for wastewater components prediction.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available