4.7 Article

Smooth Soft-Balance Discriminative Analysis for imbalanced data

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 228, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2020.106604

Keywords

Imbalance classification; Smoothing; Soft-balanced clustering; Discriminative analysis

Funding

  1. National Natural Science Foundation of China [61822601, 61773050, 61632004]
  2. Beijing Natural Science Foundation, China [Z180006]
  3. National Key Research and Development Program, China [2017YFC1703506]
  4. Fundamental Research Funds for the Central Universities, China [2018JBZ006, 2019JBZ110, 2019YJS040]
  5. Science and technology innovation planning foundation of colleges and Universities under the Ministry of Education

Ask authors/readers for more resources

Imbalance classification is a challenging research topic in machine learning where discriminative features are difficult to acquire. This study introduces a Smooth Soft-Balance Discriminative Analysis method to preprocess underrepresented data and mine the structure of majority classes, achieving better classification performance compared to state-of-the-art methods.
Imbalance classification is a challenging research topic in the community of machine learning, in which it is difficult to acquire the discriminative features. To date, a series of methods have been proposed but they still suffer from the following issues. The first issue is caused by the underrepresented data where the boundaries between classes are not clear. The second one is the complex structure in majority classes. To address these two issues, a Smooth Soft-Balance Discriminative Analysis method (S(2)BDA) is proposed to deal with imbalanced data. Among it, the underrepresented data is preprocessed via a smoothing technique so that the compact representation of each class can be obtained to make the boundaries between classes more explicit. To mine the structure of majority classes meanwhile keep the pattern hidden in the minority class, a soft-balance clustering model is designed to determine the subclasses from the majority class. Based on the balanced subclasses, S(2)BDA takes advantage of subclass-aware discriminant analysis to extract the discriminative features for imbalanced data classification. Extensive experiments are conducted on two synthetic data sets and sixteen real -world data sets with various imbalance ratios (from 4 to 39.18), data sizes (from 132 to 20000), number of categories (from 2 to 9) and dimensionalities (from 4 to 178). The experimental results have demonstrated that S(2)BDA outperforms the state-of-the-art methods in terms of the widely used evaluation metrics. (C) 2020 Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available