4.6 Article

Sub-AVG: Overestimation reduction for cooperative multi-agent reinforcement learning

期刊

NEUROCOMPUTING
卷 474, 期 -, 页码 94-106

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2021.12.039

关键词

Cooperative multi-agent reinforcement learning; Joint action value decomposition; Overestimation error; Lower update target

向作者/读者索取更多资源

Decomposing the centralized joint action value into per-agent individual action value is attractive in cooperative multi-agent reinforcement learning. However, the Q-learning-based method suffers from overestimation. This paper presents a solution called Sub-AVG, which eliminates excessive overestimation errors by using a lower update target.
Decomposing the centralized joint action value(JAV) into per-agent individual action value(IAV) is attrac-tive in cooperative multi-agent reinforcement learning(MARL). In such tasks, IAVs based on local obser-vation can perform decentralized policies, and the JAV is used for end-to-end training through traditional reinforcement learning methods, especially through the Q-learning algorithm. However, the Q-learning-based method suffers from overestimation, in which the overestimated action values may result in a sub -optimal policy. In this paper, we show that such overestimation can occur in the above Q-learning-based decomposition method. Our solution is Sub-AVG, which utilizes a lower update target by discarding the larger of previously learned IAVs and averaging the retained ones, thus eliminating the excessive overes-timation errors. Experiments in the StarCraft Multi-Agent Challenge(SMAC) environment show that Sub -AVG can lead to lower JAV estimations and better-performing policies. (c) 2021 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据