期刊
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
卷 36, 期 7, 页码 3039-3067出版社
WILEY
DOI: 10.1002/int.22388
关键词
ensemble; hybrid sampling; imbalanced data; information entropy; pattern recognition
资金
- National Major Scientific and Technological Special Project for Significant New Drugs Development [2019ZX09201004]
- Shanghai Science and Technology Program [20511100600]
- Natural Science Foundation of China [62076094]
- Natural Science Foundations of China [61806078]
- National Key Research and Development Project of Ministry of Science and Technology of China [2018AAA0101302]
- National Science Foundation of China for Distinguished Young Scholars [61725301]
The paper proposes a hybrid sampling method based on information entropy to address the issues of important sample loss and overlapping in dealing with imbalanced data. The method retains all data in the training process, handles each data view with individual basic classifiers, and demonstrates effectiveness on real-world datasets through ensemble learning.
Sampling method is one of the most commonly used techniques in dealing with imbalanced data. Most of the existing undersampling methods randomly select samples from negative class with replacement. However, it may lose some important information of the training data. Moreover, increasing the positive data by oversampling in high imbalanced situations may cause the overlapping problem. To overcome these problems, this paper proposes a hybrid sampling method. The method takes the distributions of the training data into consideration by the information entropy, thus distinguishing the important samples in the undersampling procedure. Meanwhile, since the positive data only extend to the size of each subset of the negative class in the oversampling, the overlapping problem is relieved. Further, the method retains all the data in the training procedure and generates various data views from the original training data. Then each view is handled with an individual basic classifier. Finally, all the basic classifiers are combined by the ensemble method. The newly proposed method is named as Entropy-based Hybrid Sampling Ensemble Learning (EHSEL). In addition, the EHSEL is applied to three different kinds of basic classifiers to validate its robustness. Experiments results show the great effectiveness of the EHSEL on real-world imbalanced data sets.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据