☆ 4.7 Article

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume 32, Issue 6, Pages 2622-2635

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3007399

Keywords

Neural networks; Convergence; Training; Complexity theory; Training data; Testing; Sociology; Accelerated gradient descent (GD); convolutional neural networks; generalizability; global optimality; linear convergence

Funding

Air Force Office of Scientific Research (AFOSR) [FA9550-20-1-0122]
NSF [1932196]
Rensselaer-IBM AI Research Collaboration, a part of the IBM AI Horizons Network
Directorate For Engineering
Div Of Electrical, Commun & Cyber Sys [1932196] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This research analyzes the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with ReLU activation function from the perspective of model estimation. The results show that the accelerated gradient descent algorithm can converge to the true parameters (up to the noise level) with a linear rate and faster than vanilla GD. The study also theoretically establishes the sample complexity of the required training samples.

We analyze the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with the rectified linear unit (ReLU) activation function from the perspective of model estimation. The training outputs are assumed to be generated by the neural network with the unknown ground-truth parameters plus some additive noise, and the objective is to estimate the model parameters by minimizing a nonconvex squared loss function of the training data. Assuming that the training set contains a finite number of samples generated from the Gaussian distribution, we prove that the accelerated gradient descent (GD) algorithm with a proper initialization converges to the ground-truth parameters (up to the noise level) with a linear rate even though the learning problem is nonconvex. Moreover, the convergence rate is proved to be faster than the vanilla GD. The initialization can be achieved by the existing tensor initialization method. In contrast to the existing works that assume an infinite number of samples, we theoretically establish the sample complexity of the required number of training samples. Although the neural network considered here is not deep, this is the first work to show that accelerated GD algorithms can find the global optimizer of the nonconvex learning problem of neural networks. This is also the first work that characterizes the sample complexity of gradient-based methods in learning convolutional neural networks with the nonsmooth ReLU activation function. This work also provides the tightest bound so far of the estimation error with respect to the output noise.

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper