4.7 Article

The general inefficiency of batch training for gradient descent learning

Journal

NEURAL NETWORKS
Volume 16, Issue 10, Pages 1429-1451

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/S0893-6080(03)00138-2

Keywords

batch training; on-line training; gradient descent; backpropagation; learning rate; optimization; stochastic approximation; generalization

Ask authors/readers for more resources

Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more 'correct' than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training-often orders of magnitude slower---especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy. (C) 2003 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available