资源论文Bandit Learning in Concave N -Person Games

Bandit Learning in Concave N -Person Games

2020-02-14 | |  41 |   43 |   0

Abstract 

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely lowinformation environments where the agents may not even know they are playing a game; as such, the agents’ most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players’ behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

上一篇:L4: Practical loss-based stepsize adaptation for deep learning

下一篇:Sanity Checks for Saliency Maps

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...