Journal
IEEE TRANSACTIONS ON SIGNAL PROCESSING
Volume 69, Issue -, Pages 998-1011Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSP.2020.3048229
Keywords
Signal processing algorithms; Optimization; Error compensation; Compressors; Approximation algorithms; Machine learning algorithms; Convergence; Quantization; gradient methods; stochastic gradient descent
Categories
Funding
- Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation
- Swedish Research Council (Vetenskapsrdet) [2020-03607]
- Swedish Research Council [2020-03607] Funding Source: Swedish Research Council
Ask authors/readers for more resources
The paper discusses the shift in optimization algorithm operating regime in the era of big data and introduces the improvement in solution accuracy through compression error compensation. Through experiments, it shows the advantages of Hessian-aided error compensation in quadratic problems and the strong convergence guarantees for stochastic gradient descent.
The emergence of big data has caused a dramatic shift in the operating regime for optimization algorithms. The performance bottleneck, which used to be computations, is now often communications. Several gradient compression techniques have been proposed to reduce the communication load at the price of a loss in solution accuracy. Recently, it has been shown how compression errors can be compensated for in the optimization algorithm to improve the solution accuracy. Even though convergence guarantees for error-compensated algorithms have been established, there is very limited theoretical support for quantifying the observed improvements in solution accuracy. In this paper, we show that Hessian-aided error compensation, unlike other existing schemes, avoids accumulation of compression errors on quadratic problems. We also present strong convergence guarantees of Hessian-based error compensation for stochastic gradient descent. Our numerical experiments highlight the benefits of Hessian-based error compensation, and demonstrate that similar convergence improvements are attained when only a diagonal Hessian approximation is used.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available