Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

资源分类

2020-03-04 |

61 |

34 |

Abstract

We study two problems of online learning under restricted information access. In the first problem, prediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. q We present an algorithm that achieves 图片.png regret on T rounds of this game. The second problem, the multiarmed bandit with paid observations, is a variant of the adversarial N -armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has c. We present an algorithm a cost that achieves 图片.png regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat armand time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.

上一篇：Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization

下一篇：Probabilistic Partial Canonical Correlation Analysis

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com