4.5 Article

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Journal

APPLIED MATHEMATICS AND OPTIMIZATION
Volume 84, Issue 2, Pages 2177-2220

Publisher

SPRINGER
DOI: 10.1007/s00245-020-09707-x

Keywords

Stochastic control; Markov decision process; Value function; Generalized principal eigenvalue; Bellman equation

Funding

  1. JSPS KAKENHI [18K03343]
  2. Grants-in-Aid for Scientific Research [18K03343] Funding Source: KAKEN

Ask authors/readers for more resources

This paper examines the asymptotic behavior of value functions for finite horizon countable state Markov decision processes with an absorbing set as a constraint. It is found that the value function exhibits three different limiting behaviors based on the critical value lambda(*), namely converging to a solution of the stationary equation, approaching a solution of the ergodic problem after normalization, or diverging to infinity at most with a logarithmic order. These results are used to investigate qualitative properties of the optimal Markovian policy for a finite horizon MDP with a sufficiently large time horizon.
This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value lambda(*), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if lambda(*) <, then the value function converges to a solution to the corresponding stationary equation; (ii) if lambda(*)>, then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if lambda(*) = 0, then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available