4.5 Article

DG-means: a superior greedy algorithm for clustering distributed data

期刊

JOURNAL OF SUPERCOMPUTING
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s11227-023-05508-5

关键词

Clustering; k-means; Distributed clustering; G-means; Data mining

向作者/读者索取更多资源

Clustering is the process of dividing objects into classes based on their similarities. Traditional centralized algorithms cannot handle distributed objects, but distributed clustering algorithms can extract a classification model from objects distributed across different locations. With the increasing storage of data in various sites and the large amount of data on the web, distributed clustering is becoming a prominent field. Despite the challenges such as limited bandwidth and data transfer issues, the DG-means algorithm shows superior performance compared to other algorithms when evaluated on different metrics like runtime, stability, and accuracy.
Clustering divides a set of objects into several classes, where each class is composed of similar objects. Traditional centralized clustering algorithms target those objects located on the same site since they cannot perform on distributed objects. Distributed clustering algorithms, however, can fulfil this gap. They extract a classification model from the distributed objects even when they are in different sites and locations. With the trend of storing data in different locations and sites, and with the vast amount of data propagating throughout the web, it seems it will be one of the prevailing fields. Even though much research and work have been done on this topic, it is still considered in its infantry because of the challenges that are still popping up, such as bandwidth limitation, transferring data to a single site, and many others. In this work, we present DG-means, a greedy algorithm that performs on distributed data sets. Three datasets-the wholesale dataset, banknotes dataset, and Iris dataset, are used to compare multiple distributed clustering algorithms on different metrics: runtime execution, stability, and accuracy. DG-means exhibited superior performance when compared to the other algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据