4.7 Article

Distributed Bandit Online Convex Optimization With Time-Varying Coupled Inequality Constraints

Journal

IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Volume 66, Issue 10, Pages 4620-4635

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2020.3030883

Keywords

Bandit convex optimization; distributed optimization; gradient approximation; online optimization; time-varying constraints

Funding

  1. Knut and Alice Wallenberg Foundation the Swedish Foundation for Strategic Research
  2. Ministry of Education of Republic of Singapore under Grant MoE Tier 1 [RG72/19]
  3. National Natural Science Foundation of China [61991403, 61991404, 61991400]
  4. 2020 Science and Technology Major Project of Liaoning Province [2020JH1/10100008]
  5. Swedish Research Council

Ask authors/readers for more resources

This paper explores distributed bandit online convex optimization with time-varying coupled inequality constraints, focusing on the repeated game between learners and an adversary. By optimizing the global loss functions and coupled constraint functions in multiple iterations, the algorithms are able to achieve sublinear expected regret and constraint violation.
Distributed bandit online convex optimization with time-varying coupled inequality constraints is considered, motivated by a repeated game between a group of learners and an adversary. The learners attempt tominimize a sequence of global loss functions and at the same time satisfy a sequence of coupled constraint functions, where the constraints are coupled across the distributed learners at each round. The global loss and the coupled constraint functions are the sum of local convex loss and constraint functions, respectively, which are adaptively generated by the adversary. The local loss and constraint functions are revealed in a bandit manner, i.e., only the values of loss and constraint functions are revealed to the learners at the sampling instance, and the revealed function values are held privately by each learner. Both one- and two-point bandit feedback are studied with the two corresponding distributed bandit online algorithms used by the learners. We show that sublinear expected regret and constraint violation are achieved by these two algorithms, if the accumulated variation of the comparator sequence also grows sublinearly. In particular, we show that O(T-theta) expected static regret and O(T7/4-theta) constraint violation are achieved in the one-point bandit feedback setting, and O((T max{kappa,1-kappa})) expected static regret and O(T1-kappa/2) constraint violation in the two-point bandit feedback setting, where theta is an element of(3/4, 5/6] and kappa is an element of(0, 1) are user-defined tradeoff parameters. Finally, the tightness of the theoretical results is illustrated by numerical simulations of a simple power grid example, which also compares the proposed algorithms to algorithms existing in the literature.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available