4.7 Article

Multiagent reinforcement learning for strictly constrained tasks based on Reward Recorder

期刊

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
卷 37, 期 11, 页码 8387-8411

出版社

WILEY
DOI: 10.1002/int.22945

关键词

distributed optimization; multiagent reinforcement learning; reward recorder; strictly constrained task

资金

  1. Key-Area Research and Development Program of Guangdong Province (CN) [2019B111109002]

向作者/读者索取更多资源

This paper analyzes the application of Multiagent Reinforcement Learning (MARL) in engineering problems and discovers that strict global constraints can lead to sparse rewards. To address this issue, a fully distributed and convergent MARL algorithm based on Reward Recorder is proposed. Simulation examples demonstrate that the proposed algorithm has high stability and excellent decision-making ability.
Multiagent reinforcement learning (MARL) has been widely applied in engineering problems. However, many strictly constrained problems such as distributed optimization in engineering applications are still a great challenge to MARL. Especially for strict global constraints of agents' actions, it is very easy to lead to sparse rewards. Besides, existing studies cannot solve the instability caused by partial observability while making the algorithm fully distributed. Algorithms with centralized training may encounter significant obstacles in real-world deployment. For the first time, we provide theoretical analysis for MARL to determine the adverse effects of partial observability on convergence, and a fully distributed and convergent MARL algorithm based on Reward Recorder is proposed. Each agent runs an independent reinforcement learning algorithm and uses the average-consensus protocol to estimate the global state-action value locally to achieve global optimization. To verify the performance of the algorithm, we propose a novel generalized constrained optimization model, which includes local inequality constraints and strict global constraints. The proposed distributed reinforcement learning algorithm is supported by several simulation examples. The results reveal that the proposed algorithm has high stability and excellent decision-making ability.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据