☆ 3.8 Article

ON ADAPTIVE ESTIMATION FOR DYNAMIC BERNOULLI BANDITS

FOUNDATIONS OF DATA SCIENCE (2019)

期刊

FOUNDATIONS OF DATA SCIENCE

卷 1, 期 2, 页码 197-225

出版社

AMER INST MATHEMATICAL SCIENCES-AIMS

DOI: 10.3934/fods.2019009

关键词

Dynamic bandits; Bernoulli bandits; adaptive estimation; UCB; Thompson sampling

类别

Mathematics, Applied Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary, we focus on dynamic Bernoulli bandits. Standard methods like epsilon-Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track changes in the underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of epsilon-Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the dynamic reward process, which is important for real applications. We examine the new algorithms numerically in different scenarios and the results show solid improvements of our algorithms in dynamic environments.

ON ADAPTIVE ESTIMATION FOR DYNAMIC BERNOULLI BANDITS

期刊

FOUNDATIONS OF DATA SCIENCE

出版社

AMER INST MATHEMATICAL SCIENCES-AIMS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ON ADAPTIVE ESTIMATION FOR DYNAMIC BERNOULLI BANDITS

期刊

FOUNDATIONS OF DATA SCIENCE

出版社

AMER INST MATHEMATICAL SCIENCES-AIMS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文