4.7 Article

A reliable adaptive prototype-based learning for evolving data streams with limited labels

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2023.103532

关键词

Data streams; Data-driven prototypes; Concept drift; Concept evolution; Semi-supervised classification

向作者/读者索取更多资源

Data stream mining faces challenges of concept drift and evolution. Existing learning algorithms require class labels for all data points, but the rapid pace of data streams often leads to label scarcity. To address this, we propose an adaptive, data-driven, prototype-based semi-supervised learning framework that uses dynamic prototypes to handle evolving data streams and achieve improved data abstraction and detection of novel classes.
Data stream mining presents notable challenges in the form of concept drift and evolution. Existing learning algorithms, typically designed within a supervised learning framework, require class labels for all data points. However, this is an impractical requirement given the rapid pace of data streams, which often results in label scarcity. Recognizing the realistic necessity of learning from data streams with limited labels, we propose an adaptive, data-driven, prototype-based semi-supervised learning framework specifically tailored to handle evolving data streams. Our method employs a prototype-based data representation, summarizing the continuous flow of streaming data using dynamic prototypes at varying levels of granularity. This technique enables improved data abstraction, capturing the underlying local data distributions more accurately. The model also incorporates reliability modeling and efficient emerging class discovery, dynamically updating the significance of prototypes over time and swiftly adapting to local concept drift. We further leverage these adaptive prototypes to intuitively detect concept evolution, i.e., identifying novel classes from a local density perspective. To minimize the need for manual labeling while optimizing performance, we incorporate active learning into our method. This method employs a dual-criteria approach for data point selection, considering both uncertainty and local density. These manually labeled data points, together with unlabeled data, serve to update the model efficiently and robustly. Empirical validation using several bench-mark datasets demonstrates promising performance in comparison to existing state-of-the-art techniques.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据