4.7 Article

Policy-Iteration-Based Learning for Nonlinear Player Game Systems With Constrained Inputs

Journal

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
Volume 51, Issue 10, Pages 6488-6502

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2019.2962629

Keywords

Games; Optimal control; Mathematical model; Nash equilibrium; Heuristic algorithms; Approximation algorithms; Optimization; Control constraint; critic-actor; neural network (NN); nonzero-sum (NZS) differential game; policy iteration (PI)

Funding

  1. National Natural Science Foundation of China [61773284, 61921004]

Ask authors/readers for more resources

This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is developed based on policy iteration technique to approximate obtain the Nash equilibrium. The algorithm, implemented as a critic-actor architecture, is proven to be convergent and ensures stability during the learning phase with the use of stable operators.
This article investigates the optimal control problem for nonlinear nonzero-sum differential game in the environment of no initial admissible policies while considering the control constraint. An adaptive learning algorithm is thus developed based on policy iteration technique to approximately obtain the Nash equilibrium using real-time data. A two-player continuous-time system is used to present this approximate mechanism, which is implemented as a critic-actor architecture for every player. The constraint is incorporated into this optimization by introducing the nonquadratic value function, and the associated constrained Hamilton-Jacobi equation is derived. The critic neural network (NN) and actor NN are utilized to learn the value function and the optimal control policy, respectively, in the light of novel weight tuning laws. In order to tackle the stability during the learning phase, two stable operators are designed for two actors. The proposed algorithm is proved to be convergent as a Newton's iteration, and the stability of this closed-loop system is also ensured by Lyapunov analysis. Finally, two simulation examples demonstrate the effectiveness of the proposed learning scheme by considering different constraint scenes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available