4.7 Article

ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2020.2990196

关键词

Clustering algorithms; Heuristic algorithms; Real-time systems; Partitioning algorithms; Dimensionality reduction; Clustering methods; Indexes; Self-adaptive; data stream; online clustering

资金

  1. National Natural Science Foundation of China [61472296, 61672408, 61976168]
  2. Fundamental Research Funds for the Central Universities [JB181505]
  3. Natural Science Basic Research Plan in Shaanxi Province of China [2018JM6073]
  4. China 111 Project [B16037]

向作者/读者索取更多资源

This paper proposes a fully online data stream clustering algorithm called ESA-Stream, which can dynamically learn parameters in a self-adaptive manner, speed up dimensionality reduction, and effectively and efficiently cluster data streams in an online and dynamic environment. Experimental results on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.
Many big data applications produce a massive amount of high-dimensional, real-time, and evolving streaming data. Clustering such data streams with both effectiveness and efficiency are critical for these applications. Although there are well-known data stream clustering algorithms that are based on the popular online-offline framework, these algorithms still face some major challenges. Several critical questions are still not answer satisfactorily: How to perform dimensionality reduction effectively and efficiently in the online dynamic environment? How to enable the clustering algorithm to achieve complete real-time online processing? How to make algorithm parameters learn in a self-supervised or self-adaptive manner to cope with high-speed evolving streams? In this paper, we focus on tackling these challenges by proposing a fully online data stream clustering algorithm (called ESA-Stream) that can learn parameters online dynamically in a self-adaptive manner, speedup dimensionality reduction, and cluster data streams effectively and efficiently in an online and dynamic environment. Experiments on a wide range of synthetic and real-world data streams show that ESA-Stream outperforms state-of-the-art baselines considerably in both effectiveness and efficiency.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据