4.7 Article

Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake

Journal

ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH
Volume 25, Issue 20, Pages 19488-19498

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11356-018-2147-3

Keywords

Feature selection; Random forest; Minimum redundancy and maximum relevance; Support vector machine

Funding

  1. Tianjin Municipal Education Commission research project [2017KJ125]
  2. National Natural Science Foundation of China [41372373]
  3. innovation team training plan of the Tianjin Education Committee [TD12-5037]

Ask authors/readers for more resources

As a representative index of the algal bloom, the concentration of chlorophyll-a (Chl-a) is a key parameter of concern for environmental managers. The relationships between environmental variables and Chl-a are complex and difficult to establish. Two machine learning methods, including support vector machine for regression (SVR) and random forest (RF), were used in this study to predict Chl-a concentration based on multiple variables. To improve the model accuracy and reduce the input number, two feature selection methods, including minimum redundancy and maximum relevance method (mRMR) and RF, were integrated with regression models. The results showed that the RF model had a higher predictive ability than the SVR model. Furthermore, the less computational time cost and unnecessary prior data transformation also indicated a better applicability of the RF model. The comparison between ensemble models of mRMR-RF and RF-RF showed that the RF-RF yielded a better performance with fewer variables. Seven variables selected from the candidate predictors could interpret most information, and their potential implications to Chl-a were discussed based on the level of importance. Overall, the RF-RF ensemble model can be considered as a useful approach to determine the significant stressors and achieve satisfactory prediction of Chl-a concentration.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available