资源论文Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

2020-02-04 | |  71 |   42 |   0

Abstract

 Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PMDMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

上一篇:Saliency, Scale and Information: Towards a Unifying Theory

下一篇:Finite-Time Analysis of Projected Langevin Monte Carlo

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...