4.7 Article

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

Journal

IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Volume 67, Issue 11, Pages 5900-5915

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2021.3126253

Keywords

Convex optimization; distributed optimization; stochastic gradient descent (SGD); stochastic programming

Funding

  1. National Science Foundation [IIS-1914792, DMS-1664644, CNS-1645681, ECCS-1933027]
  2. Office of Naval Research through the Multidisciplinary University Research Initiative program [N00014-19-1-2571]
  3. National Institutes of Health [R01 GM135930, UL54 TR004130]
  4. U.S. Department of Energy [DE-AR-0001282, DE-EE0009696]
  5. Boston University Kilachand Fund for Integrated Life Science and Engineering
  6. Shenzhen Research Institute of Big Data [J00120190011]
  7. National Natural Science Foundation of China [62003287]
  8. Shenzhen Science and Technology Program [CUHKSZWDZC0004]

Ask authors/readers for more resources

This article focuses on minimizing the average of cost functions over a network, where agents can communicate and exchange information. The article studies the distributed stochastic gradient descent (DSGD) method when only noisy gradient information is available, and performs a nonasymptotic convergence analysis. The main contribution of the article is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate, and construct a hard optimization problem to prove the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.
This article is concerned with minimizing the average of is cost functions over a network, in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a nonasymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network-independent convergence rate compared to centralized stochastic gradient descent. Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a hard optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available