期刊
ANNALS OF STATISTICS
卷 50, 期 2, 页码 949-986出版社
INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/21-AOS2133
关键词
Regression; interpolation; overparametrization; ridge regression; random matrix theory
资金
- NSF [DMS-1407548, IIS-1837931, DMS-1613091, CCF-1714305, IIS1741162, DMS-1554123]
- ONR [N00014-18-1-2729]
- NIH [5R01-EB001988-2]
This paper studies minimum l(2) norm interpolation least squares regression in the high-dimensional regime, focusing on linear and nonlinear models. The study discovers the phenomena of double descent behavior in prediction risk and potential benefits of overparametrization.
Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum l(2) norm (ridgeless) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors x(i) is an element of R-p are obtained by applying a linear transform to a vector of i.i.d. entries, x(i) = Sigma(1/2)zi (with z(i) Sigma R-p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, x(i) = phi(Wz(i)) (with z(i) is an element of R-d, W is an element of R(pxd )a matrix of i.i.d. entries, and phi an activation function acting componentwise on Wz(i)). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the double descent behavior of the prediction risk, and the potential benefits of overparametrization.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据