4.7 Article

An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

Journal

INFORMATION SCIENCES
Volume 608, Issue -, Pages 1131-1156

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.06.045

Keywords

Classification; Imbalanced dataset; Data intrinsic characteristics; Assessment metrics; Efficiency

Funding

  1. National Natural Science Foundation of China [71874023, 71725001, 71771037, 71971042]

Ask authors/readers for more resources

Balancing the accuracy rates of majority and minority classes is challenging in imbalanced classification. This study introduces a new criterion, an efficiency curve, to comprehensively evaluate imbalanced classifiers and analyzes the impact of imbalanced ratio and data characteristics on classifier efficiency.
Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on the performance of imbalanced classifiers, which are generally neglected by existing evaluation methods. The objective of this study is to introduce a new criterion to comprehensively evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established using data envelopment analysis without explicit inputs (DEA-WEI), to determine the trade-off between the benefits of improved minority class accuracy and the cost of reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced ratio and typical imbalanced data characteristics on the efficiency of the classifiers. Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble and undersampling techniques are more effective for overlapping and noisy data. The efficiency of cost-sensitive classifiers decreases dramatically when the imbalanced ratio increases. Finally, we investigate the reasons for the different efficiencies of classifiers on imbalanced data and recommend steps to select appropriate classifiers for imbalanced data based on data characteristics. (C) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available