资源论文Classes of Multiagent Q-learning Dynamics with -greedy Exploration

Classes of Multiagent Q-learning Dynamics with -greedy Exploration

2020-02-26 | |  70 |   56 |   0

Abstract

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, we address the discontinuous case of -greedy exploration and use it as a proxy for value-based algorithms to highlight a contrast with existing results in policy search. Analogously to previous results for gradient ascent algorithms, we provide a complete catalog of the convergence behavior of the -greedy Q-learning algorithm by introducing new subclasses of these games. We identify two subclasses of Prisoner’s Dilemma-like games where the application of Q-learning with -greedy exploration results in higher-than-Nash average payoffs for some initial conditions.

上一篇:Continuous-Time Belief Propagation

下一篇:Active Learning for Multi-Task Adaptive Filtering

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...