4.7 Article

Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2021.3071968

关键词

Mathematical model; Optimal control; Heuristic algorithms; Adaptation models; Stability criteria; Performance analysis; Convergence; Adaptive critic designs (ACD); adaptive dynamic programming (ADP); experience replay (ER); model-free control; off-policy learning; optimal tracking control

资金

  1. National Natural Science Foundation of China [62073085, 61973330, 61773075, 61533017]
  2. Beijing Natural Science Foundation [4212038]
  3. Guangdong Introducing Innovative and Enterpreneurial Teams of The Pearl River Talent Recruitment Program [2019ZT08X340]
  4. State Key Laboratory of Synthetical Automation for Process Industries [2019-KF23-03]
  5. Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China [ICT2021B48]

向作者/读者索取更多资源

A model-free optimal tracking controller for discrete-time nonlinear systems is designed using policy gradient adaptive critic designs and experience replay, aiming to improve control performance.
A model-free optimal tracking controller is designed for discrete-time nonlinear systems through policy gradient adaptive critic designs (PGACDs) with experience replay (ER). By using system transformation, optimal tracking control problems are converted into optimal regulation problems. An off-policy PGACD algorithm is developed to minimize the iterative Q-function and improve the tracking control performance. The proposed method is realized based on the critic network and the actor network (AN), which are applied to approximate the iterative Q-function and the iterative control policy, respectively. Then, the policy gradient technique is introduced to derive a novel weight updating law of the AN explicitly by using measured system data only. The convergence of the iteration is established through theoretical analysis, and the uniform ultimate boundedness is demonstrated for the closed-loop system under the PGACD-based controller by using Lyapunov's direct method. To guarantee the stability and increase the data usage efficiency of the learning process, an ER-based learning framework is designed to improve the realizability of the proposed method. Finally, simulation results of two examples are provided to demonstrate the performance of the off-policy PGACD algorithm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据