☆ 4.7 Article

Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Volume 33, Issue 4, Pages 1584-1593

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2020.3042943

Keywords

Games; Neural networks; Multi-agent systems; Markov processes; Training; Task analysis; Reinforcement learning; Deep deterministic policy gradient (DDPG); multiagent system; reinforcement learning (RL); StarCraft; stochastic environment

Funding

National Science Foundation [CNS 1947418, ECCS 1947419]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this article, a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm is proposed for cooperative multiagent games. The algorithm utilizes a two-level actor-critic structure to facilitate interactions and cooperation among agents in StarCraft combat. The local and global actor-critic structures work together to generate optimal control actions and improve cooperation in the games.

In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios.

Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper