4.6 Article

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

期刊

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.acha.2021.12.009

关键词

Deep learning; Non-linear optimization; Over-parameterized models; PL? condition

资金

  1. National Science Foundation [IIS-1815697]
  2. Simons Foundation through the Collaboration on the Theoretical Foundations of Deep Learning [DMS-2031883, 814639]
  3. Google Faculty Research Award

向作者/读者索取更多资源

This paper proposes a modern view and a general mathematical framework for loss landscapes and efficient optimization in over-parameterized machine learning models and systems of non-linear equations. The optimization landscapes of these systems are generally non-convex but satisfy the PL* condition, ensuring the existence of solutions and efficient optimization.
The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. The purpose of this work is to propose a modern view and a general mathematical framework for loss landscapes and efficient optimization in over-parameterized machine learning models and systems of non-linear equations, a setting that includes over-parameterized deep neural networks. Our starting observation is that optimization landscapes corresponding to such systems are generally not convex, even locally around a global minimum, a condition we call essential non-convexity. We argue that instead they satisfy PL*, a variant of the Polyak-Lojasiewicz condition [32,25] on most (but not all) of the parameter space, which guarantees both the existence of solutions and efficient optimization by (stochastic) gradient descent (SGD/GD). The PL* condition of these systems is closely related to the condition number of the tangent kernel associated to a non-linear system showing how a PL*-based non-linear theory parallels classical analyses of over-parameterized linear equations. We show that wide neural networks satisfy the PL* condition, which explains the (S)GD convergence to a global minimum. Finally we propose a relaxation of the PL* condition applicable to almost over-parameterized systems. (C)& nbsp;2021 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据