资源论文VARI BAD: AV ERY GOOD METHOD FORBAYES -A DAPTIVE DEEP RL VIA META -L EARNING

VARI BAD: AV ERY GOOD METHOD FORBAYES -A DAPTIVE DEEP RL VIA META -L EARNING

2020-01-02 | |  53 |   51 |   0

Abstract

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods.

上一篇:DOUBLY ROBUST BIAS REDUCTION INI NFINITE HORIZON OFF -P OLICY ESTIMATION

下一篇:DBA: DISTRIBUTED BACKDOOR ATTACKS AGAINSTF EDERATED LEARNING

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...