4.8 Article

Optimized Backstepping Control Using Reinforcement Learning of Observer-Critic-Actor Architecture Based on Fuzzy System for a Class of Nonlinear Strict-Feedback Systems

期刊

IEEE TRANSACTIONS ON FUZZY SYSTEMS
卷 30, 期 10, 页码 4322-4335

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TFUZZ.2022.3148865

关键词

Fuzzy logic; Optimal control; Observers; Backstepping; Mathematical models; Fuzzy sets; Training; Nonlinear strict feedback system; optimized backstepping (OB); reinforcement learning (RL); state observer; unmeasured state

资金

  1. National Natural Science Foundation of China [62073045, 61973185, 61873151]
  2. Shandong Provincial Natural Science Foundation, China [ZR2020MF097]
  3. Taishan Scholar Project of Shandong Province of China [TSQN201909078]
  4. Development Plan of Young Innovation Team in Colleges and Universities of Shandong Province [2019KJN011]

向作者/读者索取更多资源

This article proposes a fuzzy logic system (FLS)-based adaptive optimized backstepping control using reinforcement learning (RL) strategy for a class of nonlinear strict feedback systems with unmeasured states. The proposed method constructs an observer-critic-actor architecture based on FLS approximations in each backstepping step to optimize the virtual and actual controls. The observer estimates the unmeasurable states while the critic and actor evaluate control performance and perform control behavior, respectively. The optimized control method avoids the requirement of design constants and simplifies the RL algorithm, making it more easily applicable and widely extendable.
In this article, a fuzzy logic system (FLS)-based adaptive optimized backstepping control is developed by employing reinforcement learning (RL) strategy for a class of nonlinear strict feedback systems with unmeasured states. For making the virtual and actual controls are optimized solution of the corresponding subsystem, RL of observer-critic-actor architecture based on FLS approximations is constructed in every backstepping step, where the observer aims to estimate the unmeasurable states, and the critic and actor aim to evaluate control performance and perform control behavior, respectively. In the proposed optimized control, on the one hand, the state observer method can avoid to require the design constants making its characteristic polynomial Hurwitz, which is universally demanded in the existing observer methods; on the other hand, the RL is significantly simple in algorithm, because the critic and actor training laws are derived from the negative gradient of a simple positive function, which is produced from the partial derivative of Hamilton-Jacobi-Bellman (HJB) equation, instead of the square of approximated HJB equation. Therefore this optimized scheme can be more easily applied and widely extended. Finally, from two aspects of theory and simulation, it is demonstrated that the desired objective can be fulfilled.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据