3.8 Proceedings Paper

Discovering Minority Sub-clusters and Local Difficulty Factors from Imbalance Data

Journal

DISCOVERY SCIENCE, DS 2017
Volume 10558, Issue -, Pages 324-339

Publisher

SPRINGER INTERNATIONAL PUBLISHING AG
DOI: 10.1007/978-3-319-67786-6_23

Keywords

Class imbalance; Minority class categorization; Data difficulty factors; Class overlapping; Minority sub-clusters

Funding

  1. Polish National Science Center [DEC-2013/11/B/ST6/00963]
  2. Institute of Computing Science Statutory Funds

Ask authors/readers for more resources

Learning classifiers from imbalanced data is particularly challenging when class imbalance is accompanied by local data difficulty factors, such as outliers, rare cases, class overlapping, or minority class decomposition. Although these issues have been highlighted in previous research, there have been no proposals of algorithms that simultaneously detect all the aforementioned difficulties in a dataset. In this paper, we put forward two extensions to popular clustering algorithms, ImKmeans and ImScan, and one novel algorithm, ImGrid, that attempt to detect minority sub-clusters, outliers, rare cases, and class overlapping. Experiments with artificial datasets show that ImGrid, which uses a Bayesian test to join similar neighboring regions, is able to re-discover simulated clusters and types of minority examples on par with competing methods, while being the least sensitive to parameter tuning.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available