4.7 Article

Multiplayer Stackelberg-Nash Game for Nonlinear System via Value Iteration-Based Integral Reinforcement Learning

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2020.3042331

关键词

Games; Heuristic algorithms; Mathematical model; Approximation algorithms; Decision making; System dynamics; Nonlinear dynamical systems; Integral reinforcement learning (IRL); multiplayer Stackelberg-Nash game (SNG); neural networks (NNs); nonlinear system; value iteration (VI)

资金

  1. National Natural Science Foundation of China [61922076, 61873252]
  2. Fok Ying-Tong Education Foundation for Young Teachers in Higher Education Institutions of China [161059]
  3. Anhui Department of Science and Technology [201903a05020049]
  4. Tencent Holdings Ltd. [FR202003]
  5. Research Grants Council of the Hong Kong Special Administrative Region of China [CityU 11202819, CityU 11200717]

向作者/读者索取更多资源

This article studies a multiplayer Stackelberg-Nash game involving a nonlinear dynamical system with one leader and multiple followers. Optimal strategies for the leader and followers are derived and shown to constitute the Stackelberg-Nash equilibrium points. To overcome the difficulty in calculating the equilibrium points analytically, a novel two-level value iteration-based integral reinforcement learning algorithm is developed. The proposed method converges asymptotically to the equilibrium strategies under weak coupling conditions, and effective termination criteria are introduced to guarantee the admissibility of the obtained policy profile.
In this article, we study a multiplayer Stackelberg-Nash game (SNG) pertaining to a nonlinear dynamical system, including one leader and multiple followers. At the higher level, the leader makes its decision preferentially with consideration of the reaction functions of all followers, while, at the lower level, each of the followers reacts optimally to the leader's strategy simultaneously by playing a Nash game. First, the optimal strategies for the leader and the followers are derived from down to the top, and these strategies are further shown to constitute the Stackelberg-Nash equilibrium points. Subsequently, to overcome the difficulty in calculating the equilibrium points analytically, we develop a novel two-level value iteration-based integral reinforcement learning (VI-IRL) algorithm that relies only upon partial information of system dynamics. We establish that the proposed method converges asymptotically to the equilibrium strategies under the weak coupling conditions. Moreover, we introduce effective termination criteria to guarantee the admissibility of the policy (strategy) profile obtained from a finite number of iterations of the proposed algorithm. In the implementation of our scheme, we employ neural networks (NNs) to approximate the value functions and invoke the least-squares methods to update the involved weights. Finally, the effectiveness of the developed algorithm is verified by two simulation examples.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据