期刊
ANNALS OF STATISTICS
卷 49, 期 4, 页码 2313-2335出版社
INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/20-AOS2038
关键词
Linear regression; sparsity; lasso; cross-validation
资金
- NSF [DMS-1613091, CCF1714305, IIS-1741162]
- ONR [N00014-18-1-2729]
The Lasso is a popular regression method for high-dimensional problems, with statistical properties related to soft-thresholding denoisers. The method can be used to evaluate the performance of various data-driven procedures and has been shown to be effective in dealing with Gaussian noise.
The Lasso is a popular regression method for high-dimensional problems in which the number of parameters theta(1), ..., theta(N), is larger than the number n of samples: N > n. A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser, in a denoising problem in which the parameters (theta(i))(i <= N) are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit n, N -> infinity, pointwise in the parameters theta and in the value of the regularization parameter. Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to. belonging to l(q) balls, q is an element of [0, 1], and with respect to the regularization parameter. This allows us to derive sharp results for the performances of various data-driven procedures to tune the regularization. Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon's minimax theorem developed by Thrampoulidis, Oymak and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance that allows one to characterize properties of the minimizer itself.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据