4.6 Article

Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation

Journal

APPLIED SCIENCES-BASEL
Volume 12, Issue 13, Pages -

Publisher

MDPI
DOI: 10.3390/app12136464

Keywords

machine learning; clustering; data stream; massive parallel computation

Funding

  1. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior-Brasil (CAPES) [001]
  2. Fundacao de Amparo a Pesquisa do Estado de Sao Paulo-FAPESP [2019/09817-6]
  3. CNPq
  4. FAPEMIG

Ask authors/readers for more resources

This study aims to improve the accuracy of sequential clustering of data streams by automatically estimating the number of clusters to adapt to evolving and continuous clusters. Three evolutionary algorithms and three novel algorithms based on goodness-of-fit tests are proposed, which achieve the best results in terms of scalability and accuracy.
The increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of clusters, and their shapes. The present work aims to improve the accuracy of sequential clustering batches of data streams for scenarios in which clusters evolve dynamically and continuously, automatically estimating their number. In order to achieve this goal, three evolutionary algorithms are presented, along with three novel algorithms designed to deal with clusters of normal distribution based on goodness-of-fit tests in the context of scalable batch stream clustering with automatic estimation of the number of clusters. All of them are developed on top of MapReduce, Discretized-Stream models, and the most recent MPC frameworks to provide scalability, reliability, resilience, and flexibility. The proposed algorithms are experimentally compared with state-of-the-art methods and present the best results for accuracy for normally distributed data sets, reaching their goal.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available