4.7 Article

A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2022.3161806

Keywords

Tail; Context modeling; Upper bound; Convergence; Program processors; Learning systems; Tuning; Contextual bandits; multiplier bootstrap; second-order correction; upper confidence bound (UCB)

Ask authors/readers for more resources

In this study, we propose an estimator based on the multiplier bootstrap technique to improve the application of Upper Confidence Bound (UCB) algorithms in Contextual Bandit (CB) problems. The estimator adaptsively converges to the ground truth and has theoretical guarantees on the convergence. Extensive experiments on synthetic and real-world datasets validate the superior performance of the proposed estimator.
Upper confidence bound (UCB)-based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrap technique. Our proposed estimator mitigates the problem of specifying a heavier tail property by adaptively converging to the ground truth contextual bandit UCB (i.e., eliminating the impact of the specified heavier tail property) with theoretical guarantees on the convergence. The design and convergence analysis of the proposed estimator is technically nontrivial. The proposed estimator is generic and it can be applied to improve a variety of UCB-based contextual bandit algorithms. To demonstrate the versatility of the proposed estimator, we apply it to improve the linear reward contextual bandit UCB (LinUCB) algorithm resulting in our bootstrapping LinUCB (BootLinUCB) algorithm. We prove that the BootLinUCB has a sublinear regret. We conduct extensive experiments on both synthetic dataset and real-world dataset from Yahoo! to validate the benefits of our proposed estimator in reducing regret and the superior performance of BootLinUCB over the latest baseline.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available