☆ 4.7 Article

Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 -, 期 -, 页码 -

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2023.3306801

关键词

Combinatorial optimization; recommendation system; reinforcement learning (RL)

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A novel master-slave architecture is proposed to solve the top-K combinatorial multiarmed bandits problem with nonlinear bandit feedback and diversity constraints. It significantly outperforms existing algorithms in recommendation tasks.

We propose a novel master-slave architecture to solve the top- K combinatorial multiarmed bandits (CMABs) problem with nonlinear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning-based optimization and the policy cotraining technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network (NeuralUCB) to decide on a tradeoff between exploration and exploitation. Thanks to the elaborate design of slave models, the cotraining mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits.

Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Master-Slave Deep Architecture for Top-K Multiarmed Bandits With Nonlinear Bandit Feedback and Diversity Constraints

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文