☆ 4.7 Article

Faster Federated Learning With Decaying Number of Local SGD Steps

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

卷 34, 期 7, 页码 2198-2207

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPDS.2023.3277367

关键词

Training; Convergence; Computational modeling; Servers; Data models; Costs; Benchmark testing; Computational efficiency; deep learning; edge computing; federated learning

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In Federated Learning, a new approach is proposed to gradually reduce the number of steps K of Stochastic Gradient Descent (SGD) performed on clients per round during training. This can improve the performance of the FL model while reducing the training time and computational cost. Thorough experiments on benchmark FL datasets demonstrate the real-world benefits of this approach in terms of convergence time, computational cost, and generalization performance.

In Federated Learning (FL) client devices connected over the internet collaboratively train a machine learning model without sharing their private data with a central server or with other clients. The seminal Federated Averaging (FedAvg) algorithm trains a single global model by performing rounds of local training on clients followed by model averaging. FedAvg can improve the communication-efficiency of training by performing more steps of Stochastic Gradient Descent (SGD) on clients in each round. How-ever, client data in real-world FL is highly heterogeneous, which has been extensively shown to slow model convergence and harm final performance when K > 1 steps of SGD are performed on clients per round. In this article we propose decaying K as training progresses, which can jointly improve the final performance of the FL model whilst reducing the wall-clock time and the total compu-tational cost of training compared to using a fixed K. We analyse the convergence of FedAvg with decaying K for strongly-convex objectives, providing novel insights into the convergence properties, and derive three theoretically-motivated decay schedules for K. We then perform thorough experiments on four benchmark FL datasets (FEMNIST, CIFAR100, Sentiment140, Shakespeare) to show the real-world benefit of our approaches in terms of real -world convergence time, computational cost, and generalisation performance.

Faster Federated Learning With Decaying Number of Local SGD Steps

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Faster Federated Learning With Decaying Number of Local SGD Steps

期刊

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文