Multinomial Logit Bandit with Linear Utility Functions

资源分类

2019-11-05 |

56 |

37 |

Abstract Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player’s objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon T . This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model?separately and incur a regret no better than O? N T which is not preferred for large candidate set size N . In this paper, we consider the linear utility MNL choice model whose item utilities are represented as linear functions of ddimension item features, and propose an algorithm, titled LUMB, to exploit the underlying structure. It is proven ? that the proposed algorithm achieves O? dK T regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

上一篇：A Degeneracy Framework for Graph Similarity

下一篇：Generalization-Aware Structured Regression towards Balancing Bias and Variance

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com