4.7 Article

Classification in Dynamic Data Streams With a Scarcity of Labels

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2021.3135755

关键词

Active learning; data stream classification; dynamic classification; one class classification; scarcity of labels

向作者/读者索取更多资源

This paper proposes an algorithm named COCEL for classification in dynamic data streams. The algorithm combines a stream clustering algorithm and an ensemble of one-class classifiers to recognize and react to changes in the data stream. Experimental results demonstrate that COCEL can achieve superior or comparative accuracy with less labeled data compared to peer stream classification ensembles.
Ensemble techniques are a powerful method for recognising and reacting to changes in non-stationary data. However, most researches into dynamic classification with ensembles assume that the true class label of each incoming point is available or easily obtained. This is unrealistic in most practical applications, especially in high-velocity streams where manually labeling each point is prohibitively expensive. To address this challenge, this paper proposes an algorithm, named Clustering and One-Class Classification Ensemble Learning (COCEL), which incorporates a stream clustering algorithm and an ensemble of one-class classifiers with active learning, for classification in dynamic data streams. The method exploits the intuitive relationship between clusters and one-class classifiers to cope with a small training set (or no training set) and improve with experience, self-modifying its internal state to cope with changes in the data stream. The proposed method is evaluated on synthetic data streams exhibiting concept evolution and concept drift and a collection of high-velocity real data streams where manually labeling each incoming point is infeasible or expensive and labor intensive. Finally, a comparative evaluation with peer stream classification ensembles shows that COCEL can achieve superior or comparative accuracy while typically requiring less than 0.01% of the stream labels.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据