资源论文Bayesian Policy Search with Policy Priors

Bayesian Policy Search with Policy Priors

2019-11-12 | |  71 |   46 |   0
Abstract We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm’s search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm—our algorithm learns to learn a structured policy ef?ciently. We show how inference over the latent variables in these policy priors enables intraand intertask transfer of abstract knowledge. We demonstrate the ?exibility of this approach by learning meta search biases, by constructing a nonparametric ?nite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.

上一篇:Local and Structural Consistency for Multi-Manifold Clustering

下一篇:Learning to Rank under Multiple Annotators

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...