☆ 4.7 Article

Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2020)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 31, 期 12, 页码 5522-5533

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.2969215

关键词

Games; Uncertainty; System dynamics; Nash equilibrium; Optimal control; Time-varying systems; Integral reinforcement learning (IRL); multiplayer systems; neural networks (NNs); systems with randomly time-varying parameters; uncertainty quantification

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Office of Naval Research (ONR) [N00014-18-1-2221]
NSF [1714519, 1839804]
Directorate For Engineering
Div Of Electrical, Commun & Cyber Sys [1839804] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Control-theoretic differential games have been used to solve optimal control problems in multiplayer systems. Most existing studies on differential games either assume deterministic dynamics or dynamics corrupted with additive noise. In realistic environments, multidimensional environmental uncertainties often modulate system dynamics in a more complicated fashion. In this article, we study stochastic multiplayer differential games, where the players' dynamics are modulated by randomly time-varying parameters. We first formulate two differential games for systems of general uncertain linear dynamics, including the two-player zero-sum and multiplayer nonzero-sum games. We then show that optimal control policies, which constitute the Nash equilibrium solutions, can be derived from the corresponding Hamiltonian functions. Stability is proven using the Lyapunov type of analysis. In order to solve the stochastic differential games online, we integrate reinforcement learning (RL) and an effective uncertainty sampling method called the multivariate probabilistic collocation method (MPCM). Two learning algorithms, including the on-policy integral RL (IRL) and off-policy IRL, are designed for the formulated games, respectively. We show that the proposed learning algorithms can effectively find the Nash equilibrium solutions for the stochastic multiplayer differential games.

Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文