☆ 4.7 Article

Effect of Initial Configuration of Weights on Training and Function of Artificial Neural Networks

MATHEMATICS (2021)

Journal

MATHEMATICS

Volume 9, Issue 18, Pages -

Publisher

MDPI

DOI: 10.3390/math9182246

Keywords

training; evolution of weights; deep learning; neural networks; artificial intelligence

Funding

FCT/MEC [UIDB/50025/2020, UIDP/50025/2020]
FCT [CEECIND/04697/2017]
FCT/MCTES [YBN2020075021]
EU funds [UIDB/50008/2020-UIDP/50008/2020, PTDC/EEI-TEL/30685/2017]
Integrated Programme of SRTD SOCA [CENTRO-01-0145-FEDER-000010]
Centro 2020 program, Portugal 2020, European Union, through the European Regional Development Fund

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study statistically characterized the deviation of weights of two-hidden-layer feedforward ReLU networks trained via Stochastic Gradient Descent (SGD), finding that successful training leaves the network in the vicinity of the initial configuration of weights. However, there is a sudden increase in deviation observed within the overfitting region.

The function and performance of neural networks are largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer feedforward ReLU networks of various sizes trained via Stochastic Gradient Descent (SGD) from their initial random configuration. We compare the evolution of the distribution function of this deviation with the evolution of the loss during training. We observed that successful training via SGD leaves the network in the close neighborhood of the initial configuration of its weights. For each initial weight of a link we measured the distribution function of the deviation from this value after training and found how the moments of this distribution and its peak depend on the initial weight. We explored the evolution of these deviations during training and observed an abrupt increase within the overfitting region. This jump occurs simultaneously with a similarly abrupt increase recorded in the evolution of the loss function. Our results suggest that SGD's ability to efficiently find local minima is restricted to the vicinity of the random initial configuration of weights.

Effect of Initial Configuration of Weights on Training and Function of Artificial Neural Networks

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Effect of Initial Configuration of Weights on Training and Function of Artificial Neural Networks

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper