4.7 Article

Real-value negative selection over-sampling for imbalanced data set learning

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 129, 期 -, 页码 118-134

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2019.04.011

关键词

Imbalanced data set; Over-sampling technique; Real-value negative selection; Under-sampling

资金

  1. Fundamental Research Funds for the Central Universities [2572017E1302, 2572017E1307]
  2. Innovative talent fund of Harbin science and technology Bureau [2017RAXXJ018]
  3. Double first-class scientific research foundation of Northeast Forestry University [411112438]

向作者/读者索取更多资源

The learning problem from imbalanced data set poses a major challenge in data mining community. Conventional machine learning algorithms show poor performance in dealing with the classification problems of imbalanced data set since they are originally designed to work with balanced class distribution. In this paper, we propose a new over-sampling technique, which uses the real-value negative selection (RNS) procedure to generate artificial minority data with no requirement of actual minority data available. The generated minority data with rare actual minority data if available are combined with the majority data as input to a bi-class classification approach for learning. In the experiments, we demonstrate the effectiveness of RNS in avoiding the problems often encountered by the existing over-sampling methods such as the generation of noisy instances and almost duplicated instances in the same clusters. Moreover, the extensive experimental results on the different imbalanced datasets from UCI repository and real-world imbalanced datasets show that when dealing with the classification of imbalanced datasets, the proposed hybrid approach can achieve better performance in terms of both G-Mean and F-Measure evaluation metrics as compared to the other existing imbalanced dataset classification techniques. (C) 2019 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据