4.2 Article

Bootstrapping estimates of stability for clusters, observations and model selection

Journal

COMPUTATIONAL STATISTICS
Volume 34, Issue 1, Pages 349-372

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s00180-018-0830-y

Keywords

Ensemble; k-means; Jaccard coefficient; Clustering; Visualization

Funding

  1. National Science Foundation
  2. NSF [DMS 1557589, DMS 1312250, DMS 1557576, DMS 1557642, DMS 1557668, DMS 1557593]

Ask authors/readers for more resources

Clustering is a challenging problem in unsupervised learning. In lieu of a gold standard, stability has become a valuable surrogate to performance and robustness. In this work, we propose a non-parametric bootstrapping approach to estimating the stability of a clustering method, which also captures stability of the individual clusters and observations. This flexible framework enables different types of comparisons between clusterings and can be used in connection with two possible bootstrap approaches for stability. The first approach, scheme 1, can be used to assess confidence (stability) around clustering from the original dataset based on bootstrap replications. A second approach, scheme 2, searches over the bootstrap clusterings for an optimally stable partitioning of the data. The two schemes accommodate different model assumptions that can be motivated by an investigator's trust (or lack thereof) in the original data and additional computational considerations. We propose a hierarchical visualization extrapolated from the stability profiles that give insights into the separation of groups, and projected visualizations for the inspection of the stability of individual operations. Our approaches show good performance in simulation and on real data. These approaches can be implemented using the R package bootcluster that is available on the Comprehensive R Archive Network (CRAN).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available