☆ 4.4 Article

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

MATHEMATICS OF OPERATIONS RESEARCH (2016)

期刊

MATHEMATICS OF OPERATIONS RESEARCH

卷 41, 期 2, 页码 656-681

出版社

INFORMS

DOI: 10.1287/moor.2015.0746

关键词

partially observable Markov decision processes; total cost; optimality inequality; optimal policy

类别

Operations Research & Management Science Mathematics, Applied

资金

National Science Foundation [CMMI-0928490, CMMI-1335296]
Div Of Civil, Mechanical, & Manufact Inn
Directorate For Engineering [1335296] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper describes sufficient conditions for the existence of optimal policies for partially observable Markov decision processes (POMDPs) with Borel state, observation, and action sets, when the goal is to minimize the expected total costs over finite or infinite horizons. For infinite-horizon problems, one-step costs are either discounted or assumed to be nonnegative. Action sets may be noncompact and one-step cost functions may be unbounded. The introduced conditions are also sufficient for the validity of optimality equations, semicontinuity of value functions, and convergence of value iterations to optimal values. Since POMDPs can be reduced to completely observable Markov decision processes (COMDPs), whose states are posterior state distributions, this paper focuses on the validity of the above-mentioned optimality properties for COMDPs. The central question is whether the transition probabilities for the COMDP are weakly continuous. We introduce sufficient conditions for this and show that the transition probabilities for a COMDP are weakly continuous, if transition probabilities of the underlying Markov decision process are weakly continuous and observation probabilities for the POMDP are continuous in total variation. Moreover, the continuity in total variation of the observation probabilities cannot be weakened to setwise continuity. The results are illustrated with counterexamples and examples.

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

期刊

MATHEMATICS OF OPERATIONS RESEARCH

出版社

INFORMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Partially Observable Total-Cost Markov Decision Processes with Weakly Continuous Transition Probabilities

期刊

MATHEMATICS OF OPERATIONS RESEARCH

出版社

INFORMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文