☆ 4.7 Article

Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Volume 34, Issue 10, Pages 5037-5050

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2020.3047894

Keywords

Optimization; Deep learning; Convergence; Stochastic processes; Mathematical model; Boltzmann distribution; Task analysis; Stochastic optimization; stochastic gradient descent; parallel computing; deep learning; neural network

Funding

National Science Foundation of China [62076112, 91746109]
Institute of Education Sciences [R305A18027]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper investigates stochastic optimization in deep learning and proposes a scalable parallel algorithm. The algorithm improves the objective function in neural network models and introduces a new parallel computing strategy for accelerating the training process. Experimental results demonstrate the significant advantages of the algorithm in accelerating deep architecture training.

This paper investigates the stochastic optimization problem focusing on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel computing strategy, coined the weighted aggregating stochastic gradient descent (WASGD). Following a theoretical analysis on the characteristics of the new objective function, WASGD introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically gauges the importance of local workers and accepts them by their contributions. Furthermore, we have developed an enhanced version of the method, WASGD+, by (1) implementing a designed sample order and (2) upgrading the weight evaluation function. To validate the new method, we benchmark our pipeline against several popular algorithms including the state-of-the-art deep neural network classifier training techniques (e.g., elastic averaging SGD). Comprehensive validation studies have been conducted on four classic datasets: CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST. Subsequent results have firmly validated the superiority of the WASGD scheme in accelerating the training of deep architecture. Better still, the enhanced version, WASGD+, is shown to be a significant improvement over its prototype.

Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper