4.5 Article

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

期刊

APPLIED MATHEMATICS AND OPTIMIZATION
卷 84, 期 2, 页码 2177-2220

出版社

SPRINGER
DOI: 10.1007/s00245-020-09707-x

关键词

Stochastic control; Markov decision process; Value function; Generalized principal eigenvalue; Bellman equation

资金

  1. JSPS KAKENHI [18K03343]
  2. Grants-in-Aid for Scientific Research [18K03343] Funding Source: KAKEN

向作者/读者索取更多资源

This paper examines the asymptotic behavior of value functions for finite horizon countable state Markov decision processes with an absorbing set as a constraint. It is found that the value function exhibits three different limiting behaviors based on the critical value lambda(*), namely converging to a solution of the stationary equation, approaching a solution of the ergodic problem after normalization, or diverging to infinity at most with a logarithmic order. These results are used to investigate qualitative properties of the optimal Markovian policy for a finite horizon MDP with a sufficiently large time horizon.
This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value lambda(*), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if lambda(*) <, then the value function converges to a solution to the corresponding stationary equation; (ii) if lambda(*)>, then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if lambda(*) = 0, then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据