期刊
STATISTICAL SCIENCE
卷 38, 期 2, 页码 303-316出版社
INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/22-STS871
关键词
Clustering; replicability; multiple studies
In cancer research, clustering techniques are widely used for exploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. Our paper reviews methods for replicability of clustering analyses and proposes a novel framework for evaluating cross-study clustering replicability. The approach can be applied to any clustering algorithm and can quantify replicability using different measures of similarity between partitions.
In cancer research, clustering techniques are widely used for ex-ploratory analyses, playing a critical role in the identification of novel cancer subtypes and patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologi-cally meaningful clusters across several data sets. In this paper, we review methods for replicability of clustering analyses, and discuss a novel frame-work for evaluating cross-study clustering replicability, useful when two or more studies are available. Our approach can be applied to any clustering al-gorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e., for the whole sample) as well as lo-cally (i.e., for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the usefulness of our procedure to evalu-ate if the same clusters are identified consistently across a collection of data sets.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据