☆ 4.3 Article Proceedings Paper

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

THEORETICAL COMPUTER SCIENCE (2009)

期刊

THEORETICAL COMPUTER SCIENCE

卷 410, 期 19, 页码 1876-1902

出版社

ELSEVIER

DOI: 10.1016/j.tcs.2009.01.016

关键词

Exploration-exploitation tradeoff; Multi-armed bandits; Bernstein's inequality; High-probability bound; Risk analysis

类别

Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, Such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use Such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be Suitable for a risk-averse decision maker. We illustrate some of the results by Computer simulations. (C) 2009 Elsevier B.V. All rights reserved.

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

期刊

THEORETICAL COMPUTER SCIENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

期刊

THEORETICAL COMPUTER SCIENCE

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文