4.4 Article

Asymptotic performance of PCA for high-dimensional heteroscedastic data

期刊

JOURNAL OF MULTIVARIATE ANALYSIS
卷 167, 期 -, 页码 435-452

出版社

ELSEVIER INC
DOI: 10.1016/j.jmva.2018.06.002

关键词

Asymptotic random matrix theory; Heteroscedasticity; High-dimensional data; Principal component analysis; Subspace estimation

资金

  1. National Science Foundation Graduate Research Fellowship [DGE] [1256260]
  2. ARO [W911NF-14-1-0634]
  3. DARPA [DARPA-16-43-D3M-FP-037]
  4. UM-SJTU data science seed fund
  5. NIH [U01 EB 018753]

向作者/读者索取更多资源

Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data. (C) 2018 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据