4.7 Article

On Distributed Nonconvex Optimization: Projected Subgradient Method for Weakly Convex Problems in Networks

Journal

IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Volume 67, Issue 2, Pages 662-675

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2021.3056535

Keywords

Convergence; Optimization; Machine learning; Convex functions; Stochastic processes; Gradient methods; Training; Distributed optimization; multiagent systems; nonconvex optimization; projected stochastic subgradient method

Funding

  1. NSF [ECCS-1933878, AFOSR-15RT0767]

Ask authors/readers for more resources

The stochastic subgradient method is widely used for solving large-scale optimization problems in machine learning, especially when the problems are neither smooth nor convex. This article proposes a distributed implementation of the stochastic subgradient method with theoretical guarantee, demonstrating global convergence using the Moreau envelope stationarity measure. Additionally, it shows that deterministic DPSM linearly converges to sharp minima under a sharpness condition with geometrically diminishing step size.
The stochastic subgradient method is a widely used algorithm for solving large-scale optimization problems arising in machine learning. Often, these problems are neither smooth nor convex. Recently, Davis et al., 2018 characterized the convergence of the stochastic subgradient method for the weakly convex case, which encompasses many important applications (e.g., robust phase retrieval, blind deconvolution, biconvex compressive sensing, and dictionary learning). In practice, distributed implementations of the projected stochastic subgradient method (stoDPSM) are used to speed up risk minimization. In this article, we propose a distributed implementation of the stochastic subgradient method with a theoretical guarantee. Specifically, we show the global convergence of stoDPSM using the Moreau envelope stationarity measure. Furthermore, under a so-called sharpness condition, we show that deterministic DPSM (with a proper initialization) converges linearly to the sharp minima, using geometrically diminishing step size. We provide numerical experiments to support our theoretical analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available