4.7 Article

Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification

Journal

INFORMATION SCIENCES
Volume 512, Issue -, Pages 1009-1023

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2019.10.014

Keywords

Imbalanced learning; Oversampling approach; SMOTE; GAN; WGAN-GP

Funding

  1. National Natural Science Foundation of China [61662085, 61862065]
  2. Yunnan Province Ph.D. Scholar Newcomer Award
  3. second round of the Yunnan University Service Yunnan Action Plan project
  4. research and application of big data of intelligent transportation in Yunnan Province [2016ZD05]
  5. Yunnan Provincial Natural Science Foundation Fundamental Research Project [2019FB-16]
  6. Project of Yunnan Provincial Department of Education Science Research Fund [2017ZZX227]
  7. Yunnan University Data -Driven Software Engineering Provincial Science and Technology Innovation Team Project [2017HC012]
  8. Yunnan University Dong Lu Young-backbone Teacher Training Program
  9. Yunnan University Education Department Science Research Fund Graduate Program [2019Y0008, 2019Y0010]

Ask authors/readers for more resources

In data mining, common classification algorithms cannot effectively learn from imbalanced data. Oversampling addresses this problem by creating data for the minority class in order to balance the class distribution before the model is trained. The Traditional oversampling approaches are based on Synthetic Minority Oversampling TEchnique (SMOTE), which focus on local information but generates insufficiently realistic data. In contrast, the Generative Adversarial Network (GAN) captures the true data distribution in order to generate data for the minority class. However, both approaches are problematic owing to mode collapse and unstable training. To overcome these problems, we propose Conditional Wasserstein GAN- Gradient Penalty (CWGAN-GP), a novel and efficient synthetic oversampling approach for imbalanced datasets, which can be constructed by adding auxiliary conditional information to the WGAN-GP. CWGAN-GP generates more realistic data and overcomes the aforementioned problems. Experiments on 15 different benchmarked datasets and two real imbalanced datasets empirically demonstrate that CWGAN-GP increases the quality of synthetic data; furthermore, our approach outperforms the other oversampling approaches based on three evaluation metrics (F-measure, G-mean, and the area under the receiver operating characteristic curve) for five classifiers. Friedman and Nemenyi post hoc statistical tests also confirm that CWGAN-GP is superior to the other oversampling approaches. (C) 2019 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available