4.7 Article

Optimizing Aggregation Frequency for Hierarchical Model Training in Heterogeneous Edge Computing

Journal

IEEE TRANSACTIONS ON MOBILE COMPUTING
Volume 22, Issue 7, Pages 4181-4194

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TMC.2022.3149584

Keywords

Edge intelligence; distributed machine learning; aggregation frequency

Ask authors/readers for more resources

This paper proposes a resource-based aggregation frequency controlling method, RAF, to optimize the aggregation frequencies of edge devices in a hierarchical model training framework. RAF reduces waiting time and maximizes resource utilization, while dynamically adjusting aggregation frequencies for fast convergence speed and high accuracy during model training.
Federated Learning (FL) has been widely used for distributed machine learning in edge computing. In FL, the model parameters are iteratively aggregated from the clients to a central server, which is inclined to be the communication bottleneck and single point of failure. To solve these drawbacks, hierarchical model training frameworks like Hierarchical Federated Learning (HFL) and E-Tree learning have been proposed. One of the most challenging problems in the hierarchical model training framework is optimizing the aggregation frequencies of the edge devices at various levels. Because, in an edge computing environment, heterogeneity in the resource can introduce synchronization delays caused by waiting for slow workers and significantly impact the training performance. This paper tackles the problem with weak synchronization where edge devices on the same level have different frequencies on local updates and/or model aggregations. Existing works based on weak synchronization lack solutions to quantitatively determine the aggregation frequencies of each edge device. Thus, we propose a resource-based aggregation frequency controlling method, termed RAF, which determines the optimal aggregation frequencies of edge devices to minimize the loss function according to heterogeneous resources. Our proposed method can alleviate the waiting time and fully utilize the resources of the edge devices. Besides, RAF dynamically adjusts the aggregation frequencies at different phases during the model training to achieve fast convergence speed and high accuracy. We evaluated the performance of RAF via extensive experiments with real datasets on our self-developed edge computing testbed. Evaluation results demonstrate that RAF outperforms the benchmark approaches in terms of learning accuracy and convergence speed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available