4.0 Article

Robust covariance estimation for distributed principal component analysis

Journal

METRIKA
Volume 85, Issue 6, Pages 707-732

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s00184-021-00848-9

Keywords

Robust covariance estimation; Heavy-tailed data; Outliers; Principal component analysis; Distributed estimation

Funding

  1. NSF of China [11731012]
  2. Ten Thousands Talents Plan of Zhejiang Province [2018R52042]
  3. Fundamental Research Funds for the Central Universities

Ask authors/readers for more resources

This paper enhances the distributed PCA algorithm constructed by Fan et al. by utilizing robust covariance matrix estimators to handle heavy-tailed data. Theoretical results and extensive numerical trials indicate that the algorithm is robust to heavy-tailed data and outliers.
Fan et al. (Ann Stat 47(6):3009-3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm's guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871-2903, 2018) and Ke et al. (Stat Sci 34(3):454-471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available