☆ 4.6 Article

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2009)

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

卷 104, 期 486, 页码 682-693

出版社

AMER STATISTICAL ASSOC

DOI: 10.1198/jasa.2009.0121

关键词

Eigenvector estimation; Reduction of dimension; Regularization; Thresholding; Variable selection

类别

Statistics & Probability

资金

National Science Foundation [DMS 0505303, DMS 0072661] Funding Source: Medline
NIBIB NIH HHS [R01 EB001988, R01 EB001988-14] Funding Source: Medline

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading, principal component vector via standard PCA is consistent if and only if p(n)/n -> 0. We provide a simple algorithm for selecting it subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) >> n.

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

出版社

AMER STATISTICAL ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

出版社

AMER STATISTICAL ASSOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文