☆ 4.7 Article

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

AUTOMATICA (2012)

期刊

AUTOMATICA

卷 48, 期 11, 页码 2850-2859

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.automatica.2012.06.008

关键词

Q-learning; Adaptive control; LQR; Policy iteration; Optimization under uncertainties

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

Brain Korea 21 Project
Human Resources Development of the Korea Institute of Energy Technology Evaluation and Planning (KETEP)
Korean Government Ministry of Knowledge Economy [20104010100590]
Korea Evaluation Institute of Industrial Technology (KEIT) [20124030200040, 20104010100590] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper proposes an integral Q-learning for continuous-time (CT) linear time-invariant (LTI) systems, which solves a linear quadratic regulation (LQR) problem in real time for a given system and a value function, without knowledge about the system dynamics A and B. Here, Q-learning is referred to as a family of reinforcement learning methods which find the optimal policy by interaction with an uncertain environment. In the evolution of the algorithm, we first develop an explorized policy iteration (PI) method which is able to deal with known exploration signals. Then, the integral Q-learning algorithm for CT LTI systems is derived based on this PI and the variants of Q-functions derived from the singular perturbation of the control input. The proposed Q-learning scheme evaluates the current value function and the improved control policy at the same time, and are proven stable and convergent to the LQ optimal solution, provided that the initial policy is stabilizing. For the proposed algorithms, practical online implementation methods are investigated in terms of persistency of excitation (PE) and explorations. Finally, simulation results are provided for the better comparison and verification of the performance. (c) 2012 Elsevier Ltd. All rights reserved.

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems

期刊

AUTOMATICA

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文