4.2 Article

MRMR-EHO-Based Feature Selection Algorithm for Regression Modelling

Journal

TEHNICKI VJESNIK-TECHNICAL GAZETTE
Volume 30, Issue 2, Pages 574-583

Publisher

UNIV OSIJEK, TECH FAC
DOI: 10.17559/TV-20221119040501

Keywords

data mining; elephant herding optimization; feature selection; machine learning; MRMR

Ask authors/readers for more resources

In classical regression theory, fitting a single function model to a data set is a complex and unreliable process in complex and noisy domains. To overcome these difficulties, piecewise regression models and proper feature selection are proposed in this paper. The hybridization of Elephant Herding Optimization (EHO) and minimum Redundancy and Maximum Relevance (mRMR) is used for feature selection to improve regression problems. The results demonstrate the effectiveness of CUBIST and mRMR-EHO feature selection in various datasets and indicate that it can be used as an effective tool for predictive data modeling.
In the classical regression theory, a single function model is fit to a data set. In a complex and noisy domain, this process is too complex and/or not reliable. Piecewise regression models provide solutions to overcome these difficulties. The regression performance can be improved by proper feature selection. This paper proposes a feature selection technique for improving regression problems using the hybridization of filter and wrapper feature selection methods. It uses a hybrid framework of Elephant Herding Optimization (EHO) and minimum Redundancy and Maximum Relevance (mRMR). The mRMR-EHO is implemented to maximize the performance of individual regression algorithms and the results are provided in this research. In this paper, the effectiveness of CUBIST and mRMR-EHO feature selection using six fine grained data from small-sized data to big data is empirically demonstrated such as: a) Strawberry Plants Nutrient water supply, b) Steel Industry Energy Consumption, c) Seoul Bike Sharing Demand, d) Seoul Bike Trip duration, e) Appliances energy consumption dataset, f) Capital Bike share program data the results show a marginal increase in performance even to a very large scale. All 6 datasets were pre-processed well for building the models. The empirical results are based on the following algorithms: a) Generalized Linear Regression, b) K nearest neighbour, c) Random Forest, d) Support Vector Machine, e) Gradient Boosting Machine, f) CUBIST. Their performances are compared, and the best-performing model is selected. Ultimately, this paper puts forth that the mRMR-EHO-based feature selection with the rule-based CUBIST model for regression can be used as an effective tool for predictive data modelling in various domains.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available