☆ 3.8 Proceedings Paper

Differentially Quantized Gradient Descent

2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT) (2021)

期刊

2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT)

卷 -, 期 -, 页码 1200-1205

出版社

IEEE

DOI: 10.1109/ISIT45174.2021.9518254

关键词

类别

Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

National Science Foundation (NSF) [CCF-1751356, CCF-1956386, CNS-0932428, CCF-1018927, CCF-1423663, CCF-1409204]
Qualcomm Inc.
NASAs Jet Propulsion Laboratory through the President and Directors Fund
King Abdullah University of Science and Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces a technique called differential quantization that enables quantized gradient descent algorithms to converge at the same rate as unquantized algorithms, with theoretical convergence guarantees. Experimental results validate the theoretical analysis.

Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The only information that the server knows about the problem instance is what it receives from the worker via a rate-limited noiseless communication channel. We introduce the technique we call differential quantization (DQ) that compensates past quantization errors to make the descent trajectory of a quantized algorithm follow that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that differentially quantized gradient descent (DQ-GD) attains a linear convergence rate of max{sigma(GD), rho n2(-R)} , where sigma(GD) is the convergence rate of unquantized gradient descent (GD), rho(n) is the covering efficiency of the quantizer, and R is the bitrate per problem dimension n. Thus at any R >= log(2)rho(n)/sigma(GD), the convergence rate of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show a converse demonstrating that no GD-like quantized algorithm can converge faster than max{sigma(GD), 2(-R)}. Since quantizers exist with rho(n) -> 1 as n -> infinity (Rogers, 1963), this means that DQ-GD is asymptotically optimal. In contrast, naively quantized GD where the worker directly quantizes the gradient attains only sigma(GD) + rho(n)2(-R) . The technique of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in convergence rate obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on both simulated and real-world least-squares problems validate our theoretical analysis.

Differentially Quantized Gradient Descent

期刊

2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Differentially Quantized Gradient Descent

期刊

2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文