☆ 4.5 Article

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

PATTERN ANALYSIS AND APPLICATIONS (2020)

Journal

PATTERN ANALYSIS AND APPLICATIONS

Volume 23, Issue 1, Pages 455-466

Publisher

SPRINGER

DOI: 10.1007/s10044-019-00783-6

Keywords

Clustering; Data distribution; k-means; Fuzzy c-means (FCM); Fuzzifier; Uniform effect

Funding

National Natural Science Foundation of China [71822104, 71501056, 71690235]
Anhui Science and Technology Major Project [17030901024]
China Postdoctoral Science Foundation [2017M612072]
Hong Kong Scholars Program [2017-167]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k-means and fuzzy c-means (FCM) clustering. We first provide some related works of k-means and FCM clustering. Then, the structure decomposition analysis of the objective functions of k-means and FCM is presented. Afterward, extensive experiments on both synthetic two-dimensional and three-dimensional data sets and real-world data sets from the UCI machine learning repository are conducted. The results demonstrate that FCM has stronger uniform effect than k-means clustering. Also, it reveals that the fuzzifier value m = 2 in FCM, which has been widely adopted in many applications, is not a good choice, particularly for data sets with great variation in cluster sizes. Therefore, for data sets with significant uneven distributions in cluster sizes, a smaller fuzzifier value is preferred for FCM clustering, and k-means clustering is a better choice compared with FCM clustering.

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Journal

PATTERN ANALYSIS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Journal

PATTERN ANALYSIS AND APPLICATIONS

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper