4.6 Article

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Journal

ENTROPY
Volume 23, Issue 6, Pages -

Publisher

MDPI
DOI: 10.3390/e23060759

Keywords

k-means; kernel k-means; machine learning; nonlinear clustering; silhouette index; weighted clustering

Funding

  1. Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah [87-662-1441]

Ask authors/readers for more resources

Grouping objects based on similarities is crucial in machine learning, with k-means and kernel k-means being popular clustering methods. This study extends previous work by introducing a weighted majority voting method based on NMI, and proposing an unsupervised weighting function based on the Silhouette index to improve clustering without the need for a training set.
Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available