☆ 4.3 Article

CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE

STATISTICA SINICA (2012)

Journal

STATISTICA SINICA

Volume 22, Issue 2, Pages 443-464

Publisher

STATISTICA SINICA

DOI: 10.5705/ss.2010.148

Keywords

Hierarchical clustering; high dimension; low sample size data; maximal data piling; singular value decomposition

Funding

NSF [DMS-0805758]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

We propose a new hierarchical clustering method for high dimension, low sample size (HDLSS) data. The method utilizes the fact that each individual data vector accounts for exactly one dimension in the subspace generated by HDLSS data. The linkage that is used for measuring the distance between clusters is the orthogonal distance between affine subspaces generated by each cluster. The ideal implementation would be to consider all possible binary splits of the data and choose the one that maximizes the distance in between. Since this is not computationally feasible in general, we use the singular value decomposition for its approximation. We provide theoretical justification of the method by studying high dimensional asymptotics. Also we obtain the probability distribution of the distance measure under the null hypothesis of no split, which we use to propose a criterion for determining the number of clusters. Simulation and data analysis with microarray data show competitive clustering performance of the proposed method.

CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE

Journal

STATISTICA SINICA

Publisher

STATISTICA SINICA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CLUSTERING HIGH DIMENSION, LOW SAMPLE SIZE DATA USING THE MAXIMAL DATA PILING DISTANCE

Journal

STATISTICA SINICA

Publisher

STATISTICA SINICA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper