4.5 Article

Statistical Significance of Clustering with Multidimensional Scaling

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/10618600.2023.2219708

关键词

Cluster index; Dimension reduction; High-dimension low-sample size data; Principal component analysis; Unsupervised learning; >

向作者/读者索取更多资源

Clustering is a fundamental tool for exploratory data analysis, and statistical significance of clustering (SigClust) is a cluster evaluation method for high-dimensional, low-sample size data. The original SigClust may not work well in certain cases and is not applicable when researchers only have the dissimilarity matrix. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS) to achieve low-dimensional representations of the data and assess the statistical significance of clustering.
Clustering is a fundamental tool for exploratory data analysis. One central problem in clustering is deciding if the clusters discovered by clustering methods are reliable as opposed to being artifacts of natural sampling variation. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimension, low-sample size data. Despite its successful application to many scientific problems, there are cases where the original SigClust may not work well. Furthermore, for specific applications, researchers may not have access to the original data and only have the dissimilarity matrix. In this case, clustering is still a valuable exploratory tool, but the original SigClust is not applicable. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS). The underlying idea behind MDS-based SigClust is that one can achieve low-dimensional representations of the original data via MDS using only the dissimilarity matrix and then apply SigClust on the low-dimensional MDS space. The proposed MDS-based SigClust can circumvent the challenge of parameter estimation of the original method in high-dimensional spaces while keeping the essential clustering structure in the MDS space. Both simulations and real data applications demonstrate that the proposed method works remarkably well for assessing the statistical significance of clustering. for this article are available online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据