☆ 4.6 Article

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY

ANNALS OF STATISTICS (2021)

Journal

ANNALS OF STATISTICS

Volume 49, Issue 3, Pages 1239-1266

Publisher

INST MATHEMATICAL STATISTICS-IMS

DOI: 10.1214/20-AOS1980

Keywords

Robust statistics; shrinkage; heavy-tailed data; trace regression; low-rank matrix recovery; high-dimensional statistics

Funding

NSF [DMS-1662139, DMS-1712591, DMS-2015366]
NIH [R01-GM072611-14]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces a simple principle for robust statistical inference via shrinkage on the data, expanding the scope of high-dimensional techniques. Through an illustration of robust estimation of low-rank matrix, the proposed methodology shows similar statistical error rates under different conditions. Extensive simulations support the theoretical results, revealing the optimality of robust covariance estimator under high dimensions.

This paper introduces a simple principle for robust statistical inference via appropriate shrinkage on the data. This widens the scope of high-dimensional techniques, reducing the distributional conditions from subexponential or sub-Gaussian to more relaxed bounded second or fourth moment. As an illustration of this principle, we focus on robust estimation of the low-rank matrix Theta* from the trace regression model Y = Tr(Theta* inverted perpendicular(X)) + epsilon. It encompasses four popular problems: sparse linear model, compressed sensing, matrix completion and multitask learning. We propose to apply the penalized least-squares approach to the appropriately truncated or shrunk data. Under only bounded 2 + delta moment condition on the response, the proposed robust methodology yields an estimator that possesses the same statistical error rates as previous literature with sub-Gaussian errors. For sparse linear model and multitask regression, we further allow the design to have only bounded fourth moment and obtain the same statistical rates. As a byproduct, we give a robust covariance estimator with concentration inequality and optimal rate of convergence in terms of the spectral norm, when the samples only bear bounded fourth moment. This result is of its own interest and importance. We reveal that under high dimensions, the sample covariance matrix is not optimal whereas our proposed robust covariance can achieve optimality. Extensive simulations are carried out to support the theories.

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY

Journal

ANNALS OF STATISTICS

Publisher

INST MATHEMATICAL STATISTICS-IMS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY

Journal

ANNALS OF STATISTICS

Publisher

INST MATHEMATICAL STATISTICS-IMS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper