4.6 Article

Self-play reinforcement learning with comprehensive critic in computer games

期刊

NEUROCOMPUTING
卷 449, 期 -, 页码 207-213

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2021.04.006

关键词

Reinforcement learning; Self-play; Computer game

向作者/读者索取更多资源

The study introduces a self-play actor-critic method for training agents in computer games, incorporating a comprehensive critic into the policy gradient method. Results demonstrate that the agent trained with the SPAC method outperforms other algorithms in various evaluation approaches, showcasing the effectiveness of the comprehensive critic in the self-play training process.
Self-play reinforcement learning, where agents learn by playing with themselves, has been successfully applied in many game scenarios. However, the training procedure for self-play reinforcement learning is unstable and more sample-inefficient than (general) reinforcement learning, especially in imperfect information games. To improve the self-play training process, we incorporate a comprehensive critic into the policy gradient method to form a self-play actor-critic (SPAC) method for training agents to play com-puter games. We evaluate our method in four different environments in both competitive and coopera-tive tasks. The results show that the agent trained with our SPAC method outperforms those trained with deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO) algorithms in many different evaluation approaches, which vindicate the effect of our comprehensive critic in the self-play training procedure. CO 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据