资源论文Planning in entropy-regularized Markov decision processes and games

Planning in entropy-regularized Markov decision processes and games

2020-02-21 | |  34 |   31 |   0

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve 4 problem-independent sample complexity of order 图片.png for a desired accuracy ε, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

上一篇:MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization

下一篇:Sequential Experimental Design for Transductive Linear Bandits

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...