4.7 Article

On the Convergence of Hybrid Server-Clients Collaborative Training

Journal

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
Volume 41, Issue 3, Pages 802-819

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSAC.2022.3229443

Keywords

Federated learning; convergence analysis; stochastic gradient descent; server-clients collaboration

Ask authors/readers for more resources

This paper analyzes the model convergence of a new hybrid learning architecture that utilizes the dataset and computation power of the parameter server (PS) for collaborative model training with clients. The architecture combines parallel SGD at clients and sequential SGD at PS, and has shown advantages in terms of accuracy and convergence speed over clients-only and server-only training.
Modern distributed machine learning (ML) paradigms, such as federated learning (FL), utilize data distributed at different clients to train a global model. In such paradigm, local datasets never leave the clients for better privacy protection, and the parameter server (PS) only performs simple aggregation. In practice, however, there is often some amount of data available at the PS, and its computation capability is strong enough to carry out more demanding tasks than simple model aggregation. The focus of this paper is to analyze the model convergence of a new hybrid learning architecture, which leverages the PS dataset and its computation power for collaborative model training with clients. Different from FL where stochastic gradient descent (SGD) is always computed in parallel across clients, the new architecture has both parallel SGD at clients and sequential SGD at PS. We analyze the convergence rate upper bounds of this aggregate-then-advance design for both strongly convex and non-convex loss functions. We show that when the local SGD has an O(1/t) stepsize, the server SGD needs to scale its stepsize to no slower than O(1/t(2) in order to strictly outperform local SGD with strongly convex loss functions. The theoretical findings are corroborated by numerical experiments, where advantages in terms of both accuracy and convergence speed over clients-only (local SGD and FED AVG) and server-only training are demonstrated.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available