☆ 4.6 Article

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

SOFT COMPUTING (2023)

Journal

SOFT COMPUTING

Volume 27, Issue 18, Pages 13237-13253

Publisher

SPRINGER

DOI: 10.1007/s00500-022-07131-7

Keywords

Deep learning; Convolutional neural networks; Gradient descent; Importance sampling

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

By introducing an adaptive sampling method based on importance sampling (IS), the training of deep neural networks (DNNs) is improved. Experimental results show that this method improves both speed and variance without significant impact on classification.

Fostered by technological and theoretical developments, deep neural networks (DNNs) have achieved great success in many applications, but their training via mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. The computational cost is exacerbated by the inefficiency of the uniform sampling typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal, making the case for the study of improved methods to train DNNs. A better strategy is to sample the training instances under a distribution where the probability of being selected is proportional to the relevance of each individual instance; one way to achieve this is through importance sampling (IS), which minimizes the gradients' variance w.r.t. the network parameters, consequently improving convergence. In this paper, an IS-based adaptive sampling method to improve the training of DNNs is introduced. This method exploits side information to construct the optimal sampling distribution and is dubbed regularized adaptive sampling (RAS). Experimental comparison using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets shows that when compared against SGD and against another sampling method in the state of the art, RAS produces improvements in the speed and variance of the training process without incurring significant overhead or affecting the classification.

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Journal

SOFT COMPUTING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling

Journal

SOFT COMPUTING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper