☆ 4.5 Article

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

APPLIED MATHEMATICS AND OPTIMIZATION (2021)

期刊

APPLIED MATHEMATICS AND OPTIMIZATION

卷 84, 期 2, 页码 2177-2220

出版社

SPRINGER

DOI: 10.1007/s00245-020-09707-x

关键词

Stochastic control; Markov decision process; Value function; Generalized principal eigenvalue; Bellman equation

类别

Mathematics, Applied

资金

JSPS KAKENHI [18K03343]
Grants-in-Aid for Scientific Research [18K03343] Funding Source: KAKEN

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper examines the asymptotic behavior of value functions for finite horizon countable state Markov decision processes with an absorbing set as a constraint. It is found that the value function exhibits three different limiting behaviors based on the critical value lambda(*), namely converging to a solution of the stationary equation, approaching a solution of the ergodic problem after normalization, or diverging to infinity at most with a logarithmic order. These results are used to investigate qualitative properties of the optimal Markovian policy for a finite horizon MDP with a sufficiently large time horizon.

This paper is concerned with finite horizon countable state Markov decision processes (MDPs) having an absorbing set as a constraint. Convergence of value iteration is discussed to investigate the asymptotic behavior of value functions as the time horizon tends to infinity. It turns out that the value function exhibits three different limiting behaviors according to the critical value lambda(*), the so-called generalized principal eigenvalue, of the associated ergodic problem. Specifically, we prove that (i) if lambda(*) <, then the value function converges to a solution to the corresponding stationary equation; (ii) if lambda(*)>, then, after a suitable normalization, it approaches a solution to the corresponding ergodic problem; (iii) if lambda(*) = 0, then it diverges to infinity with, at most, a logarithmic order. We employ this convergence result to examine qualitative properties of the optimal Markovian policy for a finite horizon MDP when the time horizon is sufficiently large.

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

期刊

APPLIED MATHEMATICS AND OPTIMIZATION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

期刊

APPLIED MATHEMATICS AND OPTIMIZATION

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文