4.2 Article

Geometric consistency of principal component scores for high-dimensional mixture models and its application

Journal

SCANDINAVIAN JOURNAL OF STATISTICS
Volume 47, Issue 3, Pages 899-921

Publisher

WILEY
DOI: 10.1111/sjos.12432

Keywords

clustering; geometric representation; HDLSS; microarray; PCA; PC score

Funding

  1. Japan Society for the Promotion of Science (JSPS) [26800078, 15H01678, 17K19956]
  2. Grants-in-Aid for Scientific Research [26800078, 17K19956] Funding Source: KAKEN

Ask authors/readers for more resources

In this article, we consider clustering based on principal component analysis (PCA) for high-dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high-dimensional data. First, we derive a geometric representation of high-dimension, low-sample-size (HDLSS) data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available