☆ 4.7 Article

Thompson Sampling Algorithms for Cascading Bandits

JOURNAL OF MACHINE LEARNING RESEARCH (2021)

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Volume 22, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

Multi-armed bandits; Thompson sampling; Cascading bandits; Linear bandits; Regret minimization

Funding

National University of Singapore Start-Up Grant [R-266-000-136-133]
Singapore National Research Foundation (NRF) Fellowship [R-263-000-D02-281]
Singapore Ministry of Education AcRF Tier 1 Grant [R-263-000-E80-114]
National Research Foundation, Singapore under its AI Singapore Programme (AISG) [AISG2-RP-2020-018]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Motivated by the need for efficient optimization in online recommender systems, this paper revisits the cascading bandit model and provides theoretical guarantees for TS algorithms in a stochastic combinatorial bandit problem model with partial feedback. The proposed TS algorithms demonstrate superiority over existing UCB-based algorithms in numerical experiments.

Motivated by the important and urgent need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015a). While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter. In this paper, we first provide a problem-dependent upper bound on the regret of a TS algorithm with Beta-Bernoulli updates; this upper bound is tighter than a recent derivation under a more general setting by Huyuk and Tekin (2019). Next, we design and analyze another TS algorithm with Gaussian updates, TS-CASCADE. TS-CASCADE achieves the state-of-the-art problem-independent regret bound for cascading bandits. Complementarily, we consider a linear generalization of the cascading bandit model, which allows efficient learning in large-scale cascading bandit problem instances. We introduce and analyze a TS algorithm, which enjoys a regret bound that depends on the dimension of the linear model but not the number of items. Finally, by using information theoretic techniques and a judicious construction of cascading bandit instances, we derive a nearly-matching lower bound on the expected regret for the standard model. Our paper establishes the first theoretical guarantees on TS algorithms for a stochastic combinatorial bandit problem model with partial feedback. Numerical experiments demonstrate the superiority of the proposed TS algorithms compared to existing UCB-based ones.

Thompson Sampling Algorithms for Cascading Bandits

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Publisher

MICROTOME PUBL

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Thompson Sampling Algorithms for Cascading Bandits

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Publisher

MICROTOME PUBL

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper