☆ 4.7 Article

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

JOURNAL OF MACHINE LEARNING RESEARCH (2023)

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

卷 24, 期 -, 页码 -

出版社

MICROTOME PUBL

关键词

implicit bias; overparametrized neural network; cubic spline interpolation; smoothing spline; effective capacity

类别

Automation & Control Systems Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article investigates the training of wide neural networks using gradient descent and the corresponding implicit bias in function space. The authors demonstrate the solution of training shallow ReLU networks with different widths and show that the solution function closely fits the training data and has the smallest 2-norm of the second derivative weighted by a curvature penalty. They also analyze the impact of various initialization procedures on the resulting functions. The findings can be applied to both univariate and multivariate regression tasks.

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width -n shallow ReLU network is within n-1/2 of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. For stochastic gradient descent we obtain the same implicit bias result. We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文