4.5 Article

User Behavior Simulation for Search Result Re-ranking

期刊

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3511469

关键词

Information retrieval; ranking; user simulation; reinforcement learning; generative adversarial networks

向作者/读者索取更多资源

This article presents two different simulation environments for offline training of the RL ranking agent: the Context-aware Click Simulator (CCS) and the Fine-grained User Behavior Simulator with GAN (UserGAN). Based on the simulation environment, a User Behavior Simulation for Reinforcement Learning (UBS4RL) re-ranking framework is designed, consisting of three modules: a feature extractor for heterogeneous search results, a user simulator for collecting simulated user feedback, and a ranking agent for generation of optimized result lists.
Result ranking is one of the major concerns for Web search technologies. Most existing methodologies rank search results in descending order of relevance. To model the interactions among search results, reinforcement learning (RL algorithms have been widely adopted for ranking tasks. However, the online training of RL methods is time and resource consuming at scale. As an alternative, learning ranking policies in the simulation environment is much more feasible and efficient. In this article, we propose two different simulation environments for the offline training of the RL ranking agent: the Context-aware Click Simulator (CCS) and the Fine-grained User Behavior Simulator with GAN (UserGAN). Based on the simulation environment, we also design a User Behavior Simulation for Reinforcement Learning ( UBS4RL) re-ranking framework, which consists of three modules: a feature extractor for heterogeneous search results, a user simulator for collecting simulated user feedback, and a ranking agent for generation of optimized result lists. Extensive experiments on both simulated and practical Web search datasets show that (1) the proposed user simulators can capture and simulate fine-grained user behavior patterns by training on large-scale search logs, (2) the temporal information of user searching process is a strong signal for ranking evaluation, and (3) learning ranking policies from the simulation environment can effectively improve the search ranking performance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据