4.7 Article

Asynchronous learning for actor-critic neural networks and synchronous triggering for multiplayer system

期刊

ISA TRANSACTIONS
卷 129, 期 -, 页码 295-308

出版社

ELSEVIER SCIENCE INC
DOI: 10.1016/j.isatra.2022.02.007

关键词

Nonzero -sum differential game; Neural network; Actor-critic; Asynchronous learning; Synchronous triggering; Event -triggered communication

资金

  1. National Natural Science Foundation of China
  2. Tianjin Nat-ural Science Foundation
  3. [62022061]
  4. [20JCYBJC00880]

向作者/读者索取更多资源

In this paper, a novel asynchronous learning algorithm with event communication is developed based on actor-critic neural network structure and reinforcement learning scheme to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. The proposed algorithm is substantiated on a four-player nonlinear system and applied to achieve adaptive cruise control in a nonlinear vehicle system, demonstrating its effectiveness.
In this paper, based on actor-critic neural network structure and reinforcement learning scheme, a novel asynchronous learning algorithm with event communication is developed, so as to solve Nash equilibrium of multiplayer nonzero-sum differential game in an adaptive fashion. From the point of optimal control view, each player or local controller wants to minimize the individual infinite-time cost function by finding an optimal policy. In this novel learning framework, each player consists of one critic and one actor, and implements distributed asynchronous policy iteration to optimize decision -making process. In addition, communication burden between the system and players is effectively reduced by setting up a central event generator. Critic network executes fast updates by gradient -descent adaption while actor network gives event-induced updates using the gradient projection. The closed-loop asymptotic stability is ensured along with uniform ultimate convergence. Then, the effectiveness of the proposed algorithm is substantiated on a four-player nonlinear system, revealing that it can significantly reduce sampling numbers without impairing learning accuracy. Finally, by leveraging nonzero-sum game idea, the proposed learning scheme is also applied to solve the lateral -directional stability of a linear aircraft system, and is further extended to a nonlinear vehicle system for achieving adaptive cruise control.(c) 2022 ISA. Published by Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据