4.7 Article

Evolving data stream clustering based on constant false clustering probability

Journal

INFORMATION SCIENCES
Volume 614, Issue -, Pages 1-18

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2022.09.054

Keywords

Fully-online data stream clustering; Density-based method; Constant false clustering probability

Funding

  1. INFS (Iran National Science Foundation) [98011279]

Ask authors/readers for more resources

This paper proposes a novel, fully-online, density-based method for handling evolving data streams, addressing the issue of parameter selection in existing methods. The method has the ability to identify clusters with arbitrary shapes, is robust to noise, and provides high accuracy and efficiency in both low and high dimensions.
Today's world needs new methods to deal with and analyze the ever-increasingly gener-ated data streams. Two of the most challenging aspects of data streams are (i) concept drift, i.e. evolution of data stream over time, which requires the ability to make timely decisions against the high speed of receiving new data; (ii) limited memory storage and the imprac-ticality of using memory due to the large amount of data. Clustering is one of the common methods to process data streams. In this paper, we propose a novel, fully-online, density -based method for clustering evolving data streams. In recent years, a number of methods have been proposed, which also have the ability to cluster data streams. The main limita-tion of these methods is the use of parameters based on knowledge-expert. This work is among the first works to address this issue. Constant False Clustering Probability, CFCP, has tried to choose the algorithm's parameters based on Statistics. The proposed method has also the ability to identify clusters with arbitrary shapes. It is robust to noise and offers high accuracy and efficiency in both low and high dimensions. In this method, we deter-mine the value of the parameters by using statistical theories and do not require more information, taking advantage of expert-knowledge. The presented experimental results show that the method performs data clustering at high speeds without reducing the qual-ity compared to the state-of-the-art algorithms.(c) 2022 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available