4.7 Article

Online Semi-Supervised Classification on Multilabel Evolving High-Dimensional Text Streams

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2023.3275298

Keywords

Graphical model; micro-clusters; semi-supervised learning; text stream; topic evolution

Ask authors/readers for more resources

This article presents an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. It dynamically maintains the subspace of terms for each label with evolving micro-clusters, and uses non-parametric Dirichlet model with k nearest micro-clusters for multilabel classification. It handles gradual concept drift with the triangular time function, and abrupt concept drift by deleting outdated micro-clusters and creating new micro-clusters based on the Chinese restaurant process and Dirichlet process.
The multilabel learning task aims to predict the associated multiple classes of a given example simultaneously. Such task becomes more challenging when data arrives in stream since it requires concept drift adaptative, robust, and fast algorithm. In this article, we present an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. By leveraging a few labeled instances, OSMTS dynamically maintains the subspace of terms for each label with a set of evolving micro-clusters. For multilabel classification, k nearest micro-clusters are employed for prediction by using a non-parametric Dirichlet model. To handle the gradual concept drift in term space, the triangular time function is adopted to calculate the difference between term arriving time and cluster life span. Whereas, abrupt concept drift is dealt by considering two procedures: 1) deleting outdated micro-cluster by exploiting the exponential decay function and 2) creating new micro-clusters by adopting the Chinese restaurant process based on the Dirichlet process. The conducted experimental study provides a comparison with 12 state-of-the-art algorithms on nine datasets in terms of classification performance, runtime, and memory consumption.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available