☆ 4.5 Article

Nonparametric cluster significance testing with reference to a unimodal null distribution

BIOMETRICS (2021)

期刊

BIOMETRICS

卷 77, 期 4, 页码 1215-1226

出版社

WILEY

DOI: 10.1111/biom.13376

关键词

cluster analysis; high-dimension low-sample size; hypothesis testing; unimodality; unsupervised learning

类别

Biology Mathematical & Computational Biology Statistics & Probability

资金

National Institute of Dental and Craniofacial Research [R03DE023592]
National Institute of Environmental Health Sciences [P03ES010126]
National Center for AdvancingTranslational Sciences [UL1RR025747]
National Science Foundation [DGE-1144081]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel method to evaluate the significance of identified clusters by comparing the explained variation due to clustering from the original data to a unimodal reference distribution that preserves the covariance structure in the data. The approach is adapted for high-dimension low-sample size settings and can be used to test the null hypothesis and determine the optimal number of clusters.

Cluster analysis is an unsupervised learning strategy that is exceptionally useful for identifying homogeneous subgroups of observations in data sets of unknown structure. However, it is challenging to determine if the identified clusters represent truly distinct subgroups rather than noise. Existing approaches for addressing this problem tend to define clusters based on distributional assumptions, ignore the inherent correlation structure in the data, or are not suited for high-dimension low-sample size (HDLSS) settings. In this paper, we propose a novel method to evaluate the significance of identified clusters by comparing the explained variation due to the clustering from the original data to that produced by clustering a unimodal reference distribution that preserves the covariance structure in the data. The reference distribution is generated using kernel density estimation, and thus, does not require that the data follow a particular distribution. By utilizing sparse covariance estimation, the method is adapted for the HDLSS setting. The approach can be used to test the null hypothesis that the data cannot be partitioned into clusters and to determine the optimal number of clusters. Simulation examples, theoretical evaluations, and applications to temporomandibular disorder research and cancer microarray data illustrate the utility of the proposed method.

Nonparametric cluster significance testing with reference to a unimodal null distribution

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Nonparametric cluster significance testing with reference to a unimodal null distribution

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文