☆ 4.6 Article

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

ENTROPY (2021)

Journal

ENTROPY

Volume 23, Issue 6, Pages -

Publisher

MDPI

DOI: 10.3390/e23060759

Keywords

k-means; kernel k-means; machine learning; nonlinear clustering; silhouette index; weighted clustering

Funding

Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah [87-662-1441]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Grouping objects based on similarities is crucial in machine learning, with k-means and kernel k-means being popular clustering methods. This study extends previous work by introducing a weighted majority voting method based on NMI, and proposing an unsupervised weighting function based on the Silhouette index to improve clustering without the need for a training set.

Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept.

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Journal

ENTROPY

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

Journal

ENTROPY

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper