☆ 4.7 Article

Clustering Data Streams Based on Shared Density between Micro-Clusters

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2016)

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

卷 28, 期 6, 页码 1449-1461

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2016.2522412

关键词

Data mining; data stream clustering; density-based clustering

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

资金

US National Science Foundation [IIS-0948893]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

As more and more applications produce streaming data, clustering data streams has become an important technique for data and knowledge engineering. A typical approach is to summarize the data stream in real-time with an online process into a large number of so called micro-clusters. Micro-clusters represent local density estimates by aggregating the information of many data points in a defined area. On demand, a (modified) conventional clustering algorithm is used in a second offline step to recluster the microclusters into larger final clusters. For reclustering, the centers of the micro-clusters are used as pseudo points with the density estimates used as their weights. However, information about density in the area between micro-clusters is not preserved in the online process and reclustering is based on possibly inaccurate assumptions about the distribution of data within and between micro-clusters (e.g., uniform or Gaussian). This paper describes DBSTREAM, the first micro-cluster-based online clustering component that explicitly captures the density between micro-clusters via a shared density graph. The density information in this graph is then exploited for reclustering based on actual density between adjacent micro-clusters. We discuss the space and time complexity of maintaining the shared density graph. Experiments on a wide range of synthetic and real data sets highlight that using shared density improves clustering quality over other popular data stream clustering methods which require the creation of a larger number of smaller microclusters to achieve comparable results.

Clustering Data Streams Based on Shared Density between Micro-Clusters

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Clustering Data Streams Based on Shared Density between Micro-Clusters

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文