资源论文SLATEQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

SLATEQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

2019-10-08 | |  85 |   36 |   0

Abstract Reinforcement learning (RL) methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items—which may have interacting effects on user choice—methods are required to deal with the combinatorics of the RL action space. We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube

上一篇:Interactive Teaching Algorithms for Inverse Reinforcement Learning

下一篇:Accelerating Extreme Classification via Adaptive Feature Agglomeration

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...