4.6 Article

SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION

期刊

ANNALS OF STATISTICS
卷 50, 期 2, 页码 949-986

出版社

INST MATHEMATICAL STATISTICS-IMS
DOI: 10.1214/21-AOS2133

关键词

Regression; interpolation; overparametrization; ridge regression; random matrix theory

资金

  1. NSF [DMS-1407548, IIS-1837931, DMS-1613091, CCF-1714305, IIS1741162, DMS-1554123]
  2. ONR [N00014-18-1-2729]
  3. NIH [5R01-EB001988-2]

向作者/读者索取更多资源

This paper studies minimum l(2) norm interpolation least squares regression in the high-dimensional regime, focusing on linear and nonlinear models. The study discovers the phenomena of double descent behavior in prediction risk and potential benefits of overparametrization.
Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum l(2) norm (ridgeless) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors x(i) is an element of R-p are obtained by applying a linear transform to a vector of i.i.d. entries, x(i) = Sigma(1/2)zi (with z(i) Sigma R-p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, x(i) = phi(Wz(i)) (with z(i) is an element of R-d, W is an element of R(pxd )a matrix of i.i.d. entries, and phi an activation function acting componentwise on Wz(i)). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the double descent behavior of the prediction risk, and the potential benefits of overparametrization.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据