☆ 4.4 Article

The nonstochastic multiarmed bandit problem

SIAM JOURNAL ON COMPUTING (2003)

期刊

SIAM JOURNAL ON COMPUTING

卷 32, 期 1, 页码 48-77

出版社

SIAM PUBLICATIONS

DOI: 10.1137/s0097539701398375

关键词

adversarial bandit problem; unknown matrix games

类别

Computer Science, Theory & Methods Mathematics, Applied

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In the multiarmed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the per-round payo of our algorithm approaches that of the best arm at the rate O(T-1/2). We show by a matching lower bound that this is the best possible. We also prove that our algorithm approaches the per-round payo of any set of strategies at a similar rate: if the best strategy is chosen from a pool of N strategies, then our algorithm approaches the per-round payo of the strategy at the rate O((log N)T-1/2(-1/2)). Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payo of the unknown game at the rate O(T-1/2).

The nonstochastic multiarmed bandit problem

期刊

SIAM JOURNAL ON COMPUTING

出版社

SIAM PUBLICATIONS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The nonstochastic multiarmed bandit problem

期刊

SIAM JOURNAL ON COMPUTING

出版社

SIAM PUBLICATIONS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文