4.7 Article

Automatic recommendation of feature selection algorithms based on dataset characteristics

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 185, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2021.115589

Keywords

Feature engineering; Characterization measures; Algorithm selection; Recommendation system; Filter; Wrapper

Funding

  1. Brazilian National Council for Scientific and Technological Development [140159/2017-7, 142050/2019-9]
  2. Araucaria Foundation [028/2019]

Ask authors/readers for more resources

The paper addresses the metalearning challenge of recommending feature selection algorithms through a novel meta-feature engineering model. This model considers a broad collection of meta-features that enable the study of the relationship between dataset properties and feature selection algorithm performance.
Feature selection in real-world data mining problems is essential to make the learning task efficient and more accurate. Identifying the best feature selection algorithm, among the many available, is a complex activity that still relies heavily on human experts or some random trial-and-error procedure. Thus, the automated machine learning community has taken some steps towards the automation of this process. In this paper, we address the metalearning challenge of recommending feature selection algorithms by proposing a novel meta-feature engineering model. Our model considers a broad collection of meta-features that enable the study of the relationship between the dataset properties and the feature selection algorithm performance in terms of several criteria. We arrange the input meta-features into eight categories: (i) simple, (ii) statistical, (iii) information-theoretical, (iv) complexity, (v) landmarking, (vi) based on symbolic models, (vii) based on images, and (viii) based on complex networks (graphs). The target meta-features emerge from a multi-criteria performance measure, based on five individual performance indexes, that assesses feature selection methods grounded in information, distance, dependence, consistency, and precision measures. We evaluate our proposal using a recently developed framework that extracts the input meta-features from 213 benchmark datasets, and ranks the assessed feature selection algorithms, to fill in the target meta-features in meta-bases. This evaluation uses five state-of-the-art classification methods to induce recommendation models from meta-bases: C4.5, Random Forest, XGBoost, ANN, and SVM. The results showed that it is possible to reach an average accuracy of up to 90% applying our meta-feature engineering model. This work is the first to use an extensive empirical evaluation to provide a careful discussion of the strengths and limitations of more than 160 meta -features. These meta-features, while designed to aid the task of feature selection algorithm recommendation, can readily be employed in other metalearning scenarios. Therefore, we believe our findings are a valuable contribution to the fields of automated machine learning and data mining, as well as to the feature extraction and pattern recognition communities.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available