4.7 Article

Semi-supervised classification on data streams with recurring concept drift and concept evolution

期刊

KNOWLEDGE-BASED SYSTEMS
卷 215, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2021.106749

关键词

Data stream classification; Concept evolution; Recurring concept drift

资金

  1. National Key Research and Development Program of China [2016YFB1000901]
  2. National Natural Science Foundation of China [61976077, 61876206, 62076085, 91746209]
  3. Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education [IRT17R32]

向作者/读者索取更多资源

Mining non-stationary streams poses challenges due to their infinite length, dynamic characteristics, concept drift, concept evolution, and limited labeled data. Existing supervised methods may result in poor performance and efficiency in the presence of scarce labeled data. This paper proposes a semi-supervised framework ESCR to detect recurring concept drifts and concept evolution in data streams with partially labeled data. The framework utilizes clustering-based classifiers, Jensen-Shannon divergence for change detection, and outlier monitoring for concept evolution, while also improving efficiency through recursive function and dynamic programming. Extensive experiments show the effectiveness and efficiency of ESCR compared to other semi-supervised methods.
Mining non-stationary stream is a challenging task due to its unique property of infinite length and dynamic characteristics let alone the issues of concept drift, concept evolution and limited labeled data. Although more attention has been attracted on the issues of concept drift and evolution in data streams, however, most of existing methods are supervised in nature, which probably result in a worse classification performance and lower efficiency in the case with scarcity of labeled data. Thus, in this paper, we proposed a semi-supervised framework with recurring concept drift and novel class detection called ESCR, which aims to detect recurring concept drift and concept evolution in data streams with partially labeled data. It is firstly built on an ensemble model consisted of several clustering-based classifiers. In terms of this framework, we adopt Jensen-Shannon divergence based change detection technique on classifier confidence score instead of classification error rate to detect recurring concept drifts. Meanwhile, we take concept evolution into consideration by monitoring the outliers with strong cohesion. Moreover, we further improve the execution efficiency of our framework by exploiting the recursive function and dynamic programming. Finally, extensive experiments conducted on both benchmark and synthetic data sets demonstrate the effectiveness and efficiency of our proposed semi-supervised framework in the handling of data streams with recurring concept drifts and concept evolution, as compared to several well-known semi-supervised data stream classification methods. (C) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据