☆ 4.7 Article

On the Convergence of Hybrid Server-Clients Collaborative Training

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (2023)

Journal

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

Volume 41, Issue 3, Pages 802-819

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JSAC.2022.3229443

Keywords

Federated learning; convergence analysis; stochastic gradient descent; server-clients collaboration

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper analyzes the model convergence of a new hybrid learning architecture that utilizes the dataset and computation power of the parameter server (PS) for collaborative model training with clients. The architecture combines parallel SGD at clients and sequential SGD at PS, and has shown advantages in terms of accuracy and convergence speed over clients-only and server-only training.

Modern distributed machine learning (ML) paradigms, such as federated learning (FL), utilize data distributed at different clients to train a global model. In such paradigm, local datasets never leave the clients for better privacy protection, and the parameter server (PS) only performs simple aggregation. In practice, however, there is often some amount of data available at the PS, and its computation capability is strong enough to carry out more demanding tasks than simple model aggregation. The focus of this paper is to analyze the model convergence of a new hybrid learning architecture, which leverages the PS dataset and its computation power for collaborative model training with clients. Different from FL where stochastic gradient descent (SGD) is always computed in parallel across clients, the new architecture has both parallel SGD at clients and sequential SGD at PS. We analyze the convergence rate upper bounds of this aggregate-then-advance design for both strongly convex and non-convex loss functions. We show that when the local SGD has an O(1/t) stepsize, the server SGD needs to scale its stepsize to no slower than O(1/t(2) in order to strictly outperform local SGD with strongly convex loss functions. The theoretical findings are corroborated by numerical experiments, where advantages in terms of both accuracy and convergence speed over clients-only (local SGD and FED AVG) and server-only training are demonstrated.

On the Convergence of Hybrid Server-Clients Collaborative Training

Journal

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

On the Convergence of Hybrid Server-Clients Collaborative Training

Journal

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper