4.7 Article

Classification in Dynamic Data Streams With a Scarcity of Labels

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 35, Issue 4, Pages 3512-3524

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2021.3135755

Keywords

Active learning; data stream classification; dynamic classification; one class classification; scarcity of labels

Ask authors/readers for more resources

This paper proposes an algorithm named COCEL for classification in dynamic data streams. The algorithm combines a stream clustering algorithm and an ensemble of one-class classifiers to recognize and react to changes in the data stream. Experimental results demonstrate that COCEL can achieve superior or comparative accuracy with less labeled data compared to peer stream classification ensembles.
Ensemble techniques are a powerful method for recognising and reacting to changes in non-stationary data. However, most researches into dynamic classification with ensembles assume that the true class label of each incoming point is available or easily obtained. This is unrealistic in most practical applications, especially in high-velocity streams where manually labeling each point is prohibitively expensive. To address this challenge, this paper proposes an algorithm, named Clustering and One-Class Classification Ensemble Learning (COCEL), which incorporates a stream clustering algorithm and an ensemble of one-class classifiers with active learning, for classification in dynamic data streams. The method exploits the intuitive relationship between clusters and one-class classifiers to cope with a small training set (or no training set) and improve with experience, self-modifying its internal state to cope with changes in the data stream. The proposed method is evaluated on synthetic data streams exhibiting concept evolution and concept drift and a collection of high-velocity real data streams where manually labeling each incoming point is infeasible or expensive and labor intensive. Finally, a comparative evaluation with peer stream classification ensembles shows that COCEL can achieve superior or comparative accuracy while typically requiring less than 0.01% of the stream labels.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available