☆ 4.7 Article

Hybrid decision tree-based machine learning models for short-term water quality prediction

CHEMOSPHERE (2020)

Journal

CHEMOSPHERE

Volume 249, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.chemosphere.2020.126169

Keywords

Decision tree-based model; Short-term; Water quality prediction; Extreme gradient boosting; Random forest; Data denoising

Funding

Open Fund of State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation (Southwest Petroleum University) [PLN201710]
National Natural Science Foundation of China [71901184]
Humanities and Social Science Fund of Ministry of Education of China [19YJCZH119]
China Scholarship Council [201708030006]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Water resources are the foundation of people's life and economic development, and are closely related to health and the environment. Accurate prediction of water quality is the key to improving water management and pollution control. In this paper, two novel hybrid decision tree-based machine learning models are proposed to obtain more accurate short-term water quality prediction results. The basic models of the two hybrid models are extreme gradient boosting (XGBoost) and random forest (RF), which respectively introduce an advanced data denoising technique - complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Taking the water resources of Gales Creek site in Tualatin River (one of the most polluted rivers in the world) Basin as an example, a total of 1875 data (hourly data) from May 1, 2019 to July 20, 2019 are collected. Two hybrid models are used to predict six water quality indicators, including water temperature, dissolved oxygen, pH value, specific conductance, turbidity, and fluorescent dissolved organic matter. Six error metrics are introduced as the basis of performance evaluation, and the results of the two models are compared with the other four conventional models. The results reveal that: (1) CEEMDAN-RF performs best in the prediction of temperature, dissolved oxygen and specific conductance, the mean absolute percentage errors (MAPEs) are 0.69%, 1.05%, and 0.90%, respectively. CEEMDAN-XGBoost performs best in the prediction of pH value, turbidity, and fluorescent dissolved organic matter, the MAPEs are 0.27%, 14.94%, and 1.59%, respectively. (2) The average MAPEs of CEEMDAN-RF and CEEMMDAN-XGBoost models are the smallest, which are 3.90% and 3.71% respectively, indicating that their overall prediction performance is the best. In addition, the stability of the prediction model is also discussed in this paper. The analysis shows that the prediction stability of CEEMDAN-RF and CEEMDAN-XGBoost is higher than other benchmark models. (C) 2020 Elsevier Ltd. All rights reserved.

Hybrid decision tree-based machine learning models for short-term water quality prediction

Journal

CHEMOSPHERE

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Hybrid decision tree-based machine learning models for short-term water quality prediction

Journal

CHEMOSPHERE

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper