4.6 Article

Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets

期刊

IEEE ACCESS
卷 9, 期 -, 页码 13717-13726

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3051969

关键词

Classification algorithms; Testing; Feature extraction; Training data; Training; Anomaly detection; Vegetation; Data mining; one-class classifiers; class imbalance; machine learning; ensemble learning

资金

  1. Ministry of Science and Technology of Taiwan [MOST 109-2410-H-182-012]
  2. Chang Gung Memorial Hospital, Linkou [BMRPH13]

向作者/读者索取更多资源

The study treats class imbalance as anomaly detection problem, investigating the performance of OCC classifiers and their performance in ensemble learning. Results show that OCC classifiers perform well on datasets with high class imbalance ratios, but feature selection does not usually improve their performance, while combining multiple OCC classifiers can outperform individual classifiers.
Class imbalance learning is an important research problem in data mining and machine learning. Most solutions including data levels, algorithm levels, and cost sensitive approaches are derived using multi-class classifiers, depending on the number of classes to be classified. One-class classification (OCC) techniques, in contrast, have been widely used for anomaly or outlier detection where only normal or positive class training data are available. In this study, we treat every two-class imbalanced dataset as an anomaly detection problem, which contains a larger number of data in the majority class, i.e. normal or positive class, and a very small number of data in the minority class. The research objectives of this paper are to understand the performance of OCC classifiers and examine the level of performance improvement when feature selection is considered for pre-processing the training data in the majority class and ensemble learning is employed to combine multiple OCC classifiers. Based on 55 datasets with different ranges of class imbalance ratios and one-class support vector machine, isolation forest, and local outlier factor as the representative OCC classifiers, we found that the OCC classifiers are good at high imbalance ratio datasets, outperforming the C4.5 baseline. In most cases, though, performing feature selection does not improve the performance of the OCC classifiers in most. However, many homogeneous and heterogeneous OCC classifier ensembles do outperform the single OCC classifiers, with some specific combinations of multiple OCC classifiers, both with and without feature selection, performing similar to or better than the baseline combination of SMOTE and C4.5.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据