Cascading Non-Stationary Bandits: Online Learning to Rank in the
Non-Stationary Cascade Model
Abstract
Non-stationarity appears in many online applications such as web search and advertising. In this
paper, we study the online learning to rank problem
in a non-stationary environment where user preferences change abruptly at an unknown moment in
time. We consider the problem of identifying the
K most attractive items and propose cascading nonstationary bandits, an online learning variant of the
cascading model, where a user browses a ranked
list from top to bottom and clicks on the first attractive item. We propose two algorithms for solving
this non-stationary problem: CascadeDUCB and
CascadeSWUCB. We analyze their performance
and derive gap-dependent upper bounds on the nstep regret of these algorithms. We also establish
a lower bound on the regret for cascading nonstationary bandits and show that both algorithms
match the lower bound up to a logarithmic factor.
Finally, we evaluate their performance on a realworld web search click dataset