☆ 4.7 Article

On integral generalized policy iteration for continuous-time linear quadratic regulations

AUTOMATICA (2014)

期刊

AUTOMATICA

卷 50, 期 2, 页码 475-489

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.automatica.2013.12.009

关键词

LQR; Generalized policy iteration; Reinforcement learning; Adaptive control; Optimization under uncertainties

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

Institute of BioMed-IT, Energy-IT and Smart-IT Technology (BEST), a Brain Korea 21 plus program, Yonsei University

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper mathematically analyzes the integral generalized policy iteration (I-GPI) algorithms applied to a class of continuous-time linear quadratic regulation (LQR) problems with the unknown system matrix A. GPI is the general idea of interacting policy evaluation and policy improvement steps of policy iteration (PI), for computing the optimal policy. We first introduce the update horizon (h) over bar, and then show that (i) all of the I-GPI methods with the same (h) over bar can be considered equivalent and that (ii) the value function approximated in the policy evaluation step monotonically converges to the exact one as (h) over bar -> infinity. This reveals the relation between the computational complexity and the update (or time) horizon of I-GPI as well as between I-PI and I-GPI in the limit (h) over bar -> infinity. We also provide and discuss two modes of convergence of I-GPI; I-GPI behaves like PI in one mode, and in the other mode, it performs like value iteration for discrete-time LQR and infinitesimal GPI ((h) over bar -> 0). From these results, a new classification of the integral reinforcement learning is formed with respect to (h) over bar. Two matrix inequality conditions for stability, the region of local monotone convergence, and data-driven (adaptive) implementation methods are also provided with detailed discussion. Numerical simulations are carried out for verification and further investigations. (C) 2013 Elsevier Ltd. All rights reserved.

On integral generalized policy iteration for continuous-time linear quadratic regulations

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On integral generalized policy iteration for continuous-time linear quadratic regulations

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文