4.6 Article

Selective Inference for Hierarchical Clustering

Journal

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2022.2116331

Keywords

Difference in means; Hypothesis testing; Post-selection inference; Type I error

Funding

  1. NSERC Discovery Grants program
  2. NIH [R01-EB026908, R01-DA047869, R01-GM123993]
  3. NSF [DMS-1653017, DMS-1252624]
  4. Simons Investigator Award in Mathematical Modeling of Living Systems [560585]

Ask authors/readers for more resources

This article proposes a selective inference approach to test for a difference in means between two clusters, addressing the issue of inflated Type I error rate when using classical tests with clustering-defined groups.
Classical tests for a difference in means control the Type I error rate when the groups are defined a priori. However, when the groups are instead defined via clustering, then applying a classical test yields an extremely inflated Type I error rate. Notably, this problem persists even if two separate and independent datasets are used to define the groups and to test for a difference in their means. To address this problem, in this article, we propose a selective inference approach to test for a difference in means between two clusters. Our procedure controls the selective Type I error rate by accounting for the fact that the choice of null hypothesis was made based on the data. We describe how to efficiently compute exact p-values for clusters obtained using agglomerative hierarchical clustering with many commonly used linkages. We apply our method to simulated data and to single-cell RNA-sequencing data. for this article are available online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available