4.7 Article

An efficient henry gas solubility optimization for feature selection

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 152, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2020.113364

Keywords

Classification; Dimensionality reduction; Feature selection (FS); Henry gas solubility optimization (HGSO); Pattern recognition

Ask authors/readers for more resources

In classification, regression, and other data mining applications, feature selection (FS) is an important preprocess step which helps avoid advert effect of noisy, misleading, and inconsistent features on the model performance. Formulating it into a global combinatorial optimization problem, researchers have employed metaheuristic algorithms for selecting the prominent features to simplify and enhance the quality of the high-dimensional datasets, in order to devise efficient knowledge extraction systems. However, when employed on datasets with extensively large feature-size, these methods often suffer from local optimality problem due to considerably large solution space. In this study, we propose a novel approach to dimensionality reduction by using Henry gas solubility optimization (HGSO) algorithm for selecting significant features, to enhance the classification accuracy. By employing several datasets with wide range of feature size, from small to massive, the proposed method is evaluated against well-known metaheuristic algorithms including grasshopper optimization algorithm (GOA), whale optimization algorithm (WOA), dragonfly algorithm (DA), grey wolf optimizer (GWO), salp swarm algorithm (SSA), and others from recent relevant literature. We used k-nearest neighbor (k-NN) and support vector machine (SVM) as expert systems to evaluate the selected feature-set. Wilcoxon's ranksum non-parametric statistical test was carried out at 5% significance level to judge whether the results of the proposed algorithms differ from those of the other compared algorithms in a statistically significant way. Overall, the empirical analysis suggests that the proposed approach is significantly effective on low, as well as, considerably high dimensional datasets, by producing 100% accuracy on classification problems with more than 11,000 features. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available