☆ 4.7 Article

Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 -, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2022.3183865

关键词

Deep learning; multiagent reinforcement learning (MARL); task cooperation; value function factorization

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Engineering and Physical Sciences Research Council, U.K., Multimodal Imitation Learning in Multi-Agent Environments (MIMIC) [EP/T000783/1]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces the concept of residual Q-networks (RQNs) for multiagent reinforcement learning (MARL), which leads to improved efficiency and stability, demonstrating more robust performance in various environments.

Multiagent reinforcement learning (MARL) is useful in many problems that require the cooperation and coordination of multiple agents. Learning optimal policies using reinforcement learning in a multiagent setting can be very difficult as the number of agents increases. Recent solutions such as value decomposition networks (VDNs), QMIX, QTRAN, and QPLEX adhere to the centralized training and decentralized execution (CTDE) scheme and perform factorization of the joint action-value functions. However, these methods still suffer from increased environmental complexity, and at times fail to converge in a stable manner. We propose a novel concept of residual Q-networks (RQNs) for MARL, which learns to transform the individual Q-value trajectories in a way that preserves the individual-global-max (IGM) criteria, but is more robust in factorizing action-value functions. The RQN acts as an auxiliary network that accelerates convergence and will become obsolete as the agents reach the training objectives. The performance of the proposed method is compared against several state-of-the-art techniques such as QPLEX, QMIX, QTRAN, and VDN, in a range of multiagent cooperative tasks. The results illustrate that the proposed method, in general, converges faster, with increased stability, and shows robust performance in a wider family of environments. The improvements in results are more prominent in environments with severe punishments for noncooperative behaviors and especially in the absence of complete state information during training time.

Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Residual Q-Networks for Value Function Factorizing in Multiagent Reinforcement Learning

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文