4.7 Article

Constructing classifiers for imbalanced data using diversity optimisation

Journal

INFORMATION SCIENCES
Volume 565, Issue -, Pages 1-16

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.02.069

Keywords

Imbalanced data; Diversity optimisation; Classification

Ask authors/readers for more resources

The study proposes a new approach using diversity optimization for synthetic instance generation to address imbalanced data in classification. The proposed formulations show competitive performance in improving classifier performance, with DIWO outperforming other comparable methods. Both formulations exhibit robustness by reducing classifier variance.
Imbalanced data is challenging in classification. This paper proposes a new approach to address imbalanced data by adopting diversity optimisation to generate synthetic instances for over-sampling the minority class. Diversity optimisation assures that the generated instances are close to the minority group but not identical. It also ensures the optimal spread of the generated instances in the space. We develop two formulations named as Diversity-based Average Distance Over-sampling (DADO) and Diversity-based Instance Wise Over-sampling (DIWO). We evaluate the proposed formulations' performance by designing experiments using both synthetic and real data with unbalanced classes. We examine the performance through area under curve (AUC), F1-score and g-mean measures in comparison with comparable synthetic over-sampling methods. We compare the methods using the obtained measures of the best performing classifier and statistical testing of all combinations over three imbalance levels using seven classifiers. The results show that both proposed formulations perform competitive to improve the performance of classifiers, and DIWO outperforms other comparable methods. Both perform robust by reducing the classifiers' variance. We discuss the strengths and limitations of these formulations using the real data examples, runtime complexity and sensitivity analysis. We also demonstrate the possibility of utilising DADO and DIWO for multi-class imbalanced data. (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available