4.7 Article

Real-value negative selection over-sampling for imbalanced data set learning

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 129, Issue -, Pages 118-134

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2019.04.011

Keywords

Imbalanced data set; Over-sampling technique; Real-value negative selection; Under-sampling

Funding

  1. Fundamental Research Funds for the Central Universities [2572017E1302, 2572017E1307]
  2. Innovative talent fund of Harbin science and technology Bureau [2017RAXXJ018]
  3. Double first-class scientific research foundation of Northeast Forestry University [411112438]

Ask authors/readers for more resources

The learning problem from imbalanced data set poses a major challenge in data mining community. Conventional machine learning algorithms show poor performance in dealing with the classification problems of imbalanced data set since they are originally designed to work with balanced class distribution. In this paper, we propose a new over-sampling technique, which uses the real-value negative selection (RNS) procedure to generate artificial minority data with no requirement of actual minority data available. The generated minority data with rare actual minority data if available are combined with the majority data as input to a bi-class classification approach for learning. In the experiments, we demonstrate the effectiveness of RNS in avoiding the problems often encountered by the existing over-sampling methods such as the generation of noisy instances and almost duplicated instances in the same clusters. Moreover, the extensive experimental results on the different imbalanced datasets from UCI repository and real-world imbalanced datasets show that when dealing with the classification of imbalanced datasets, the proposed hybrid approach can achieve better performance in terms of both G-Mean and F-Measure evaluation metrics as compared to the other existing imbalanced dataset classification techniques. (C) 2019 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available