Journal
METRIKA
Volume 85, Issue 6, Pages 707-732Publisher
SPRINGER HEIDELBERG
DOI: 10.1007/s00184-021-00848-9
Keywords
Robust covariance estimation; Heavy-tailed data; Outliers; Principal component analysis; Distributed estimation
Categories
Funding
- NSF of China [11731012]
- Ten Thousands Talents Plan of Zhejiang Province [2018R52042]
- Fundamental Research Funds for the Central Universities
Ask authors/readers for more resources
This paper enhances the distributed PCA algorithm constructed by Fan et al. by utilizing robust covariance matrix estimators to handle heavy-tailed data. Theoretical results and extensive numerical trials indicate that the algorithm is robust to heavy-tailed data and outliers.
Fan et al. (Ann Stat 47(6):3009-3031, 2019) constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm's guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker (Ann Stat 46(6A):2871-2903, 2018) and Ke et al. (Stat Sci 34(3):454-471, 2019) to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite 6th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available