4.5 Article

Statistical Significance of Clustering with Multidimensional Scaling

Journal

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/10618600.2023.2219708

Keywords

Cluster index; Dimension reduction; High-dimension low-sample size data; Principal component analysis; Unsupervised learning; >

Ask authors/readers for more resources

Clustering is a fundamental tool for exploratory data analysis, and statistical significance of clustering (SigClust) is a cluster evaluation method for high-dimensional, low-sample size data. The original SigClust may not work well in certain cases and is not applicable when researchers only have the dissimilarity matrix. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS) to achieve low-dimensional representations of the data and assess the statistical significance of clustering.
Clustering is a fundamental tool for exploratory data analysis. One central problem in clustering is deciding if the clusters discovered by clustering methods are reliable as opposed to being artifacts of natural sampling variation. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimension, low-sample size data. Despite its successful application to many scientific problems, there are cases where the original SigClust may not work well. Furthermore, for specific applications, researchers may not have access to the original data and only have the dissimilarity matrix. In this case, clustering is still a valuable exploratory tool, but the original SigClust is not applicable. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS). The underlying idea behind MDS-based SigClust is that one can achieve low-dimensional representations of the original data via MDS using only the dissimilarity matrix and then apply SigClust on the low-dimensional MDS space. The proposed MDS-based SigClust can circumvent the challenge of parameter estimation of the original method in high-dimensional spaces while keeping the essential clustering structure in the MDS space. Both simulations and real data applications demonstrate that the proposed method works remarkably well for assessing the statistical significance of clustering. for this article are available online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available