4.7 Article

Online Semi-Supervised Classification on Multilabel Evolving High-Dimensional Text Streams

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2023.3275298

关键词

Graphical model; micro-clusters; semi-supervised learning; text stream; topic evolution

向作者/读者索取更多资源

This article presents an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. It dynamically maintains the subspace of terms for each label with evolving micro-clusters, and uses non-parametric Dirichlet model with k nearest micro-clusters for multilabel classification. It handles gradual concept drift with the triangular time function, and abrupt concept drift by deleting outdated micro-clusters and creating new micro-clusters based on the Chinese restaurant process and Dirichlet process.
The multilabel learning task aims to predict the associated multiple classes of a given example simultaneously. Such task becomes more challenging when data arrives in stream since it requires concept drift adaptative, robust, and fast algorithm. In this article, we present an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. By leveraging a few labeled instances, OSMTS dynamically maintains the subspace of terms for each label with a set of evolving micro-clusters. For multilabel classification, k nearest micro-clusters are employed for prediction by using a non-parametric Dirichlet model. To handle the gradual concept drift in term space, the triangular time function is adopted to calculate the difference between term arriving time and cluster life span. Whereas, abrupt concept drift is dealt by considering two procedures: 1) deleting outdated micro-cluster by exploiting the exponential decay function and 2) creating new micro-clusters by adopting the Chinese restaurant process based on the Dirichlet process. The conducted experimental study provides a comparison with 12 state-of-the-art algorithms on nine datasets in terms of classification performance, runtime, and memory consumption.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据