4.7 Article

Entropy-based hybrid sampling ensemble learning for imbalanced data

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Volume 36, Issue 7, Pages 3039-3067

Publisher

WILEY
DOI: 10.1002/int.22388

Keywords

ensemble; hybrid sampling; imbalanced data; information entropy; pattern recognition

Funding

  1. National Major Scientific and Technological Special Project for Significant New Drugs Development [2019ZX09201004]
  2. Shanghai Science and Technology Program [20511100600]
  3. Natural Science Foundation of China [62076094]
  4. Natural Science Foundations of China [61806078]
  5. National Key Research and Development Project of Ministry of Science and Technology of China [2018AAA0101302]
  6. National Science Foundation of China for Distinguished Young Scholars [61725301]

Ask authors/readers for more resources

The paper proposes a hybrid sampling method based on information entropy to address the issues of important sample loss and overlapping in dealing with imbalanced data. The method retains all data in the training process, handles each data view with individual basic classifiers, and demonstrates effectiveness on real-world datasets through ensemble learning.
Sampling method is one of the most commonly used techniques in dealing with imbalanced data. Most of the existing undersampling methods randomly select samples from negative class with replacement. However, it may lose some important information of the training data. Moreover, increasing the positive data by oversampling in high imbalanced situations may cause the overlapping problem. To overcome these problems, this paper proposes a hybrid sampling method. The method takes the distributions of the training data into consideration by the information entropy, thus distinguishing the important samples in the undersampling procedure. Meanwhile, since the positive data only extend to the size of each subset of the negative class in the oversampling, the overlapping problem is relieved. Further, the method retains all the data in the training procedure and generates various data views from the original training data. Then each view is handled with an individual basic classifier. Finally, all the basic classifiers are combined by the ensemble method. The newly proposed method is named as Entropy-based Hybrid Sampling Ensemble Learning (EHSEL). In addition, the EHSEL is applied to three different kinds of basic classifiers to validate its robustness. Experiments results show the great effectiveness of the EHSEL on real-world imbalanced data sets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available