4.6 Article

THE MULTI-ARMED BANDIT PROBLEM: AN EFFICIENT NONPARAMETRIC SOLUTION

Journal

ANNALS OF STATISTICS
Volume 48, Issue 1, Pages 346-373

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/19-AOS1809

Keywords

Efficiency; KL-UCB; subsampling; Thompson sampling; UCB

Funding

  1. MOE Grant [R-155-000-158-112]

Ask authors/readers for more resources

Lai and Robbins (Adv. in Appl. Math. 6 (1985) 4-22) and Lai (Ann. Statist. 15 (1987) 1091-1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like epsilon-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available