4.7 Article

Analysing effectiveness of grey theory-based feature selection for meteorological estimation models

Journal

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2023.106243

Keywords

Feature selection; Grey theory; Machine learning; Meteorology

Ask authors/readers for more resources

This study analyzes the effectiveness of the application of grey theory in feature selection for daily dew point temperature and daily pan-evaporation estimation models. Comparisons and analyses are made between the feature subset identified by grey theory and subsets selected based on different Pearson correlation coefficient slabs. The results show that the models using grey theory-based feature selection demonstrated average or above-average performances.
Grey theory is capable of representing uncertainty and has proved its applicability in prioritizing features for estimation problems and various decision-making problems. This study analyses the effectiveness of the application of grey theory in feature selection for daily dew point temperature (DPT) and daily pan-evaporation (PAN-EVP) estimation models. Feature subset identified by grey theory and subsets selected based on very high, high, medium, and low Pearson correlation coefficient (PCC) slabs are compared and analysed. Random Forest (RF) and Extreme gradient Boosting (XgBoost) are used for modelling. The performance of the models is evaluated using the root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results showed that high PCC feature subset models underperformed on both datasets. The models with features selected using grey theory and medium PCC slab performed identically for both datasets. For the PAN-EVP dataset, the size of the grey theory-based subset is larger, both RF and XgBoost models with this scenario gave accuracy measures within the calculated average, unlike medium PCC slab subset. For the DPT dataset, the size of the grey theory-based subset is smaller and RF model with this scenario gave accuracy measures within the calculated average, unlike medium PCC slab subset. Grey theory and medium PCC slab subsets gave accuracy measures close to the calculated average for DPT estimation using XgBoost model. The study concludes that the models using grey theory-based feature selection demonstrated average or above-average performances and therefore is an effective feature selection technique.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available