☆ 4.6 Article

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

APPLIED SCIENCES-BASEL (2023)

期刊

APPLIED SCIENCES-BASEL

卷 13, 期 7, 页码 -

出版社

MDPI

DOI: 10.3390/app13074569

关键词

wargame; decision-making; reinforcement learning; policy gradient; multi-agent

类别

Chemistry, Multidisciplinary Engineering, Multidisciplinary Materials Science, Multidisciplinary Physics, Applied

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Wargames have become essential for simulating different war scenarios, but traditional decision-making methods are no longer effective. To address this, a wargame decision-making method based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is proposed. The method leverages techniques like Partially Observable Markov Decision Process (POMDP) and Gumbel-Softmax estimator to optimize the MADDPG algorithm for the wargame environment.

Wargames are essential simulators for various war scenarios. However, the increasing pace of warfare has rendered traditional wargame decision-making methods inadequate. To address this challenge, wargame-assisted decision-making methods that leverage artificial intelligence techniques, notably reinforcement learning, have emerged as a promising solution. The current wargame environment is beset by a large decision space and sparse rewards, presenting obstacles to optimizing decision-making methods. To overcome these hurdles, a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) based wargame decision-making method is presented. The Partially Observable Markov Decision Process (POMDP), joint action-value function, and the Gumbel-Softmax estimator are applied to optimize MADDPG in order to adapt to the wargame environment. Furthermore, a wargame decision-making method based on the improved MADDPG algorithm is proposed. Using supervised learning in the proposed approach, the training efficiency is improved and the space for manipulation before the reinforcement learning phase is reduced. In addition, a policy gradient estimator is incorporated to reduce the action space and to obtain the global optimal solution. Furthermore, an additional reward function is designed to address the sparse reward problem. The experimental results demonstrate that our proposed wargame decision-making method outperforms the pre-optimization algorithm and other algorithms based on the AC framework in the wargame environment. Our approach offers a promising solution to the challenging problem of decision-making in wargame scenarios, particularly given the increasing speed and complexity of modern warfare.

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文