4.7 Article

FedSA: A Semi-Asynchronous Federated Learning Mechanism in Heterogeneous Edge Computing

Journal

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
Volume 39, Issue 12, Pages 3654-3672

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSAC.2021.3118435

Keywords

Training; Servers; Computational modeling; Data models; Collaborative work; Analytical models; Edge computing; Edge computing; federated learning; semi-asynchronous mechanism; heterogeneity; non-IID

Funding

  1. National Science Foundation of China (NSFC) [62132019, 61936015, 62102391, U1709217]

Ask authors/readers for more resources

Federated learning (FL) presents challenges of edge heterogeneity, Non-IID data, and communication resource constraints when training machine learning models over distributed edge nodes. In this paper, a semi-asynchronous federated learning mechanism (FedSA) is proposed to address these challenges more effectively by aggregating local models by arrival order and determining the number of participating workers to minimize training completion time. FedSA also deploys adaptive learning rates based on workers' participation frequency to improve training accuracy on Non-IID data, and extends the mechanism to dynamic and multiple learning tasks scenarios. Experimental results demonstrate the effectiveness of the proposed mechanism and algorithms in addressing these challenges.
Federated learning (FL) involves training machine learning models over distributed edge nodes (i.e., workers) while facing three critical challenges, edge heterogeneity, Non-IID data and communication resource constraint. In the synchronous FL, the parameter server has to wait for the slowest workers, leading to significant waiting time due to edge heterogeneity. Though asynchronous FL can well tackle the edge heterogeneity, it requires frequent model transfers, resulting in massive communication resource consumption. Moreover, the different relative frequency of workers participating in asynchronous updating may seriously hurt training accuracy, especially on Non-IID data. In this paper, we propose a semi-asynchronous federated learning mechanism (FedSA), where the parameter server aggregates a certain number of local models by their arrival order in each round. We theoretically analyze the quantitative relationship between the convergence bound of FedSA and different factors, e.g., the number of participating workers in each round, the degree of data Non-IID and edge heterogeneity. Based on the convergence bound, we present an efficient algorithm to determine the number of participating workers to minimize the training completion time. To further improve the training accuracy on Non-IID data, FedSA deploys adaptive learning rates for workers by their relative participation frequency. We extend our proposed mechanism to the dynamic and multiple learning tasks scenarios. Experimental results on the testbed show that our proposed mechanism and algorithms address the three challenges more effectively than the state-of-the-art solutions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available