☆ 4.7 Article

Stochastic Approximation for Risk-Aware Markov Decision Processes

IEEE TRANSACTIONS ON AUTOMATIC CONTROL (2021)

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

卷 66, 期 3, 页码 1314-1320

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAC.2020.2989702

关键词

Markov decision processes (MDPs); risk measure; saddle point; stochastic approximation; Q-learning

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

SRIBD International Postdoctoral Fellowship
National Research Foundation, Prime Ministers Office, Singapore under its Campus for Research Excellence and Technological Enterprise program
Singapore Ministry of Education Grant [R-266-000-083-133]
Singapore Ministry of Education Tier II Grant [MOE2015-T2-2-148]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A stochastic approximation algorithm was developed to solve risk-aware Markov decision processes, covering various risk measures and establishing almost sure convergence and convergence rate of the algorithm. The overall convergence rate of the algorithm was proven to be Omega((ln(1/delta epsilon)/epsilon(2))(1/k) + (ln(1/epsilon))(1/(1-k))) with probability at least 1-delta for a given error tolerance epsilon > 0 and learning rate k in the range (1/2, 1].

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs Q-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance epsilon > 0 for optimal Q-value estimation gap and learning rate k is an element of (1/2, 1], the overall convergence rate of our algorithm is Omega((ln(1/delta epsilon)/epsilon(2))(1/k) + (ln(1/epsilon))(1/(1-k))) with probability at least 1-delta.

Stochastic Approximation for Risk-Aware Markov Decision Processes

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Stochastic Approximation for Risk-Aware Markov Decision Processes

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文