Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model

资源分类

2019-10-08 |

68 |

41 |

Abstract Non-stationarity appears in many online applications such as web search and advertising. In this paper, we study the online learning to rank problem in a non-stationary environment where user preferences change abruptly at an unknown moment in time. We consider the problem of identifying the K most attractive items and propose cascading nonstationary bandits, an online learning variant of the cascading model, where a user browses a ranked list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving this non-stationary problem: CascadeDUCB and CascadeSWUCB. We analyze their performance and derive gap-dependent upper bounds on the nstep regret of these algorithms. We also establish a lower bound on the regret for cascading nonstationary bandits and show that both algorithms match the lower bound up to a logarithmic factor. Finally, we evaluate their performance on a realworld web search click dataset

上一篇：ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network

下一篇：Dense Transformer Networks for Brain Electron Microscopy Image Segmentation

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com