4.3 Article

A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMCB.2010.2104319

Keywords

Cluster validity index; complex data structure; connectivity; Conn_Index; prototype-based clustering

Funding

  1. NASA, Science Mission Directorate [NNG05GA94G]

Ask authors/readers for more resources

Evaluation of how well the extracted clusters fit the true partitions of a data set is one of the fundamental challenges in unsupervised clustering because the data structure and the number of clusters are unknown a priori. Cluster validity indices are commonly used to select the best partitioning from different clustering results; however, they are often inadequate unless clusters are well separated or have parametrical shapes. Prototype-based clustering (finding of clusters by grouping the prototypes obtained by vector quantization of the data), which is becoming increasingly important for its effectiveness in the analysis of large high-dimensional data sets, adds another dimension to this challenge. For validity assessment of prototype-based clusterings, previously proposed indexes-mostly devised for the evaluation of point-based clusterings-usually perform poorly. The poor performance is made worse when the validity indexes are applied to large data sets with complicated cluster structure. In this paper, we propose a new index, Conn_Index, which can be applied to data sets with a wide variety of clusters of different shapes, sizes, densities, or overlaps. We construct Conn_Index based on inter-and intra-cluster connectivities of prototypes. Connectivities are defined through a connectivity matrix, which is a weighted Delaunay graph where the weights indicate the local data distribution. Experiments on synthetic and real data indicate that Conn_Index outperforms existing validity indices, used in this paper, for the evaluation of prototype-based clustering results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available