期刊
EXPERT SYSTEMS WITH APPLICATIONS
卷 129, 期 -, 页码 118-134出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2019.04.011
关键词
Imbalanced data set; Over-sampling technique; Real-value negative selection; Under-sampling
类别
资金
- Fundamental Research Funds for the Central Universities [2572017E1302, 2572017E1307]
- Innovative talent fund of Harbin science and technology Bureau [2017RAXXJ018]
- Double first-class scientific research foundation of Northeast Forestry University [411112438]
The learning problem from imbalanced data set poses a major challenge in data mining community. Conventional machine learning algorithms show poor performance in dealing with the classification problems of imbalanced data set since they are originally designed to work with balanced class distribution. In this paper, we propose a new over-sampling technique, which uses the real-value negative selection (RNS) procedure to generate artificial minority data with no requirement of actual minority data available. The generated minority data with rare actual minority data if available are combined with the majority data as input to a bi-class classification approach for learning. In the experiments, we demonstrate the effectiveness of RNS in avoiding the problems often encountered by the existing over-sampling methods such as the generation of noisy instances and almost duplicated instances in the same clusters. Moreover, the extensive experimental results on the different imbalanced datasets from UCI repository and real-world imbalanced datasets show that when dealing with the classification of imbalanced datasets, the proposed hybrid approach can achieve better performance in terms of both G-Mean and F-Measure evaluation metrics as compared to the other existing imbalanced dataset classification techniques. (C) 2019 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据