☆ 4.5 Article

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

APPLIED MATHEMATICS AND OPTIMIZATION (2021)

Journal

APPLIED MATHEMATICS AND OPTIMIZATION

Volume 84, Issue 2, Pages 2177-2220

Publisher

SPRINGER

DOI: 10.1007/s00245-020-09707-x

Keywords

Stochastic control; Markov decision process; Value function; Generalized principal eigenvalue; Bellman equation

Funding

JSPS KAKENHI [18K03343]
Grants-in-Aid for Scientific Research [18K03343] Funding Source: KAKEN

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper examines the asymptotic behavior of value functions for finite horizon countable state Markov decision processes with an absorbing set as a constraint. It is found that the value function exhibits three different limiting behaviors based on the critical value lambda(*), namely converging to a solution of the stationary equation, approaching a solution of the ergodic problem after normalization, or diverging to infinity at most with a logarithmic order. These results are used to investigate qualitative properties of the optimal Markovian policy for a finite horizon MDP with a sufficiently large time horizon.

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value lambda(*), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if lambda(*) <, then the value function converges to a solution to the corresponding stationary equation; (ii) if lambda(*)>, then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if lambda(*) = 0, then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Journal

APPLIED MATHEMATICS AND OPTIMIZATION

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Journal

APPLIED MATHEMATICS AND OPTIMIZATION

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper