☆ 4.6 Article

Conservative Contextual Combinatorial Cascading Bandit

IEEE ACCESS (2021)

期刊

IEEE ACCESS

卷 9, 期 -, 页码 151434-151443

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2021.3124416

关键词

Machine learning; multi-armed bandit; online learning; recommender system

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The C-4-UCB algorithm addresses performance guarantee issues in the contextual combinatorial cascading bandit framework by integrating the online learning algorithm UCB and a conservative mechanism. Experimental results show that compared to C-3-bandit, the regret in known and unknown baseline reward situations is only enlarged by an additive constant term.

C Contextual combinatorial cascading bandit (C-3-bandit) is a powerful multi-armed bandit framework that balances the tradeoff between exploration and exploitation in the learning process. It well captures users' click behavior and has been applied in a broad spectrum of real-world applications such as recommender systems and search engines. However, such a framework does not provide a performance guarantee of the initial exploration phase. To that end, we propose conservative contextual combinatorial cascading bandit (C-4-bandit) model, aiming to address the aforementioned crucial modeling issues. In this problem, the learning agent is given some contexts and recommends a list of items not worse than the baseline strategy, and then observes the reward by some stopping rule. The objective is now to maximize the reward while simultaneously satisfying the safety constraint, i.e. guaranteeing the algorithm to perform at least as well as a baseline strategy. To tackle this new problem, we extend an online learning algorithm, called Upper Confidence Bound (UCB), to deal with a critical tradeoff between exploitation and exploration and employ the conservative mechanism to properly handle the safety constraints. By carefully integrating these two techniques, we develop a new algorithm, called C-4-UCB for this problem. Further, we rigorously prove the n-step upper bound in two situations: known baseline reward and unknown baseline reward. The regret in both situations is only enlarged by an additive constant term compared to results of C-3-bandit. Finally, experiments on synthetic and realistic datasets demonstrate its advantages.

Conservative Contextual Combinatorial Cascading Bandit

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Conservative Contextual Combinatorial Cascading Bandit

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文