4.5 Article

Cross-Study Replicability in Cluster Analysis

期刊

STATISTICAL SCIENCE
卷 38, 期 2, 页码 303-316

出版社

INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/22-STS871

关键词

Clustering; replicability; multiple studies

向作者/读者索取更多资源

In cancer research, clustering techniques are widely used for exploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. Our paper reviews methods for replicability of clustering analyses and proposes a novel framework for evaluating cross-study clustering replicability. The approach can be applied to any clustering algorithm and can quantify replicability using different measures of similarity between partitions.
In cancer research, clustering techniques are widely used for ex-ploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologi-cally meaningful clusters across several data sets. In this paper, we review methods for replicability of clustering analyses, and discuss a novel frame-work for evaluating cross-study clustering replicability, useful when two or more studies are available. Our approach can be applied to any clustering al-gorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e., for the whole sample) as well as lo-cally (i.e., for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the usefulness of our procedure to evalu-ate if the same clusters are identified consistently across a collection of data sets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据