4.7 Article

Dynamic Ensemble Selection for Imbalanced Data Streams With Concept Drift

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2022.3183120

Keywords

Training; Bagging; Adaptation models; Data models; Learning systems; Control engineering; Sun; Concept drift; data stream; dynamic ensemble selection (DES); imbalance learning; oversampling

Funding

  1. National Natural Science Foundation of China [61973305, 52121003, 61573361]
  2. Six Talent Peak Project in Jiangsu Province [2017-DZXX-046]
  3. 111 Project [B21014]
  4. Royal Society International Exchanges 2020 Cost Share

Ask authors/readers for more resources

This study proposes a dynamic ensemble selection method to deal with concept drift in imbalanced data streams. By using a novel technique to generate new instances and selecting the optimal combination based on candidate classifier performance, the proposed method outperforms others in terms of classification accuracy and tracking new concepts.
Ensemble learning, as a popular method to tackle concept drift in data stream, forms a combination of base classifiers according to their global performances. However, concept drift generally occurs in local data space, causing significantly different performances of a base classifier at different locations. Thus, employing global performance as a criterion to select base classifier is inappropriate. Moreover, data stream is often accompanied by class imbalance problem, which affects the classification accuracy of ensemble learning on minority instances. To drawback these problems, a dynamic ensemble selection for imbalanced data streams with concept drift (DES-ICD) is proposed. For data arrived in chunk-by-chunk, a novel synthetic minority oversampling technique with adaptive nearest neighbors (AnnSMOTE) is developed to generate new minority instances that conform to the new concept. Following that, DES-ICD creates a base classifier on newly arrived data chunk balanced by AnnSMOTE and merges it with historical base classifiers to form a candidate classifier pool. For each query instance, the optimal combination is constructed in terms of the performance of candidate classifiers in its neighborhood. Experimental results for nine synthetic and five real-world datasets show that the proposed method outperforms seven comparative methods on classification accuracy and tracks new concepts in an imbalanced data stream more preciously.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available