4.7 Article

Q-Learning With Uniformly Bounded Variance

Journal

IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Volume 67, Issue 11, Pages 5948-5963

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TAC.2021.3133184

Keywords

Complexity theory; Convergence; Approximation algorithms; Upper bound; Reinforcement learning; Costs; Standards; Q-learning; reinforcement learning (RL); stochastic optimal control

Funding

  1. ARO [W911NF2010055]
  2. National Science Foundation [EPCN 1935389]
  3. U.S. Department of Defense (DOD) [W911NF2010055] Funding Source: U.S. Department of Defense (DOD)

Ask authors/readers for more resources

This study aims to introduce a new class of algorithms with sample complexity uniformly bounded over all discount factors; meanwhile, it points out that prior bounds concern value function approximation rather than policy approximation.
Sample complexity bounds are a common performance metric in the reinforcement learning literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. These results seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded over all discount factors. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that these prior bounds concern value function approximation and not policy approximation. We show that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded over all discount factors.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available