4.6 Article

FINITE SAMPLE APPROXIMATION RESULTS FOR PRINCIPAL COMPONENT ANALYSIS: A MATRIX PERTURBATION APPROACH

期刊

ANNALS OF STATISTICS
卷 36, 期 6, 页码 2791-2817

出版社

INST MATHEMATICAL STATISTICS
DOI: 10.1214/08-AOS618

关键词

Principal component analysis; spiked covariance model; random matrix theory; matrix perturbation; phase transition

资金

  1. Lord Sieff of Brimpton
  2. Hana and Julius Rosen

向作者/读者索取更多资源

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n -> infinity. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit p, n -> infinity, with p/n = c. We present a matrix perturbation view of the phase transition phenomenon, and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite p, n where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size n, the eigenvector of sample PICA may exhibit a sharp loss of tracking, suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据