期刊
KNOWLEDGE-BASED SYSTEMS
卷 220, 期 -, 页码 -出版社
ELSEVIER
DOI: 10.1016/j.knosys.2021.106901
关键词
Hybrid feature selection; Ensemble feature selection; Multiple classifiers; Robust feature subset; High-dimensional imbalanced data
资金
- Basic Science Research Program through the National Research Foundation of Korea (NRF) - Ministry of Education, Science and Technology, South Korea [NRF-2016 R1D1A1B03932110]
Research on feature selection for high-dimensional imbalanced data has been a focus of attention. A hybrid method that combines filter and ensemble learning is proposed to select the best feature subset.
In recent years, research on feature selection for high-dimensional imbalanced data has attracted a considerable amount of attention. The filter-wrapper hybrid method, which is a conventional method of feature selection for high-dimensional data, aims to reduce excessive computational time. On the other hand, ensemble learning-based feature selection, even though it has a high level of computational complexity, focuses exclusively on the discovery of robust features. From this perspective, combining these two feature selection methods is not easy. However, a combined method is essential to advancing machine learning research that addresses real-world problems. We propose an filter-centric hybrid method based on ensemble-learning that can select the best feature subset for high-dimensional imbalanced data. The basic concept of the proposed method is to design a feature evaluation scheme based on the filter method and to apply ensemble learning with reasonable computational time. To achieve this objective, our innovative method utilizes predictions produced by multiple classifiers as inputs of the feature evaluation function. As a result, it can reflect the predictive performance of the classifiers and overcome the low performance of selected features by filter methods. In addition, it can find robust features simultaneously. To demonstrate the superiority of the proposed method, we perform various experiments using 14 experimental datasets that consist of low-dimensional balanced, high-dimensional balanced, and high-dimensional imbalanced datasets. Finally, we compare the proposed method with state-of-the-art feature selection methods. (c) 2021 Elsevier B.V. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据