期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 34, 期 2, 页码 617-630出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2020.2990196
关键词
Clustering algorithms; Heuristic algorithms; Real-time systems; Partitioning algorithms; Dimensionality reduction; Clustering methods; Indexes; Self-adaptive; data stream; online clustering
类别
资金
- National Natural Science Foundation of China [61472296, 61672408, 61976168]
- Fundamental Research Funds for the Central Universities [JB181505]
- Natural Science Basic Research Plan in Shaanxi Province of China [2018JM6073]
- China 111 Project [B16037]
This paper proposes a fully online data stream clustering algorithm called ESA-Stream, which can dynamically learn parameters in a self-adaptive manner, speed up dimensionality reduction, and effectively and efficiently cluster data streams in an online and dynamic environment. Experimental results on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online data stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment. Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据