☆ 4.7 Article Proceedings Paper

Multiagent learning using a variable learning rate

ARTIFICIAL INTELLIGENCE (2002)

期刊

ARTIFICIAL INTELLIGENCE

卷 136, 期 2, 页码 215-250

出版社

ELSEVIER

DOI: 10.1016/S0004-3702(02)00121-2

关键词

multiagent learning; reinforcement learning; game theory

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, Win or Learn Fast, for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restricted class of iterated matrix games. We also present empirical results on a variety of more general stochastic games, in situations of self-play and otherwise, demonstrating the wide applicability of this method. (C) 2002 Published by Elsevier Science B.V.

Multiagent learning using a variable learning rate

期刊

ARTIFICIAL INTELLIGENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multiagent learning using a variable learning rate

期刊

ARTIFICIAL INTELLIGENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文