资源论文A Bayesian Approach for Policy Learning from Trajectory Preference Queries

A Bayesian Approach for Policy Learning from Trajectory Preference Queries

2020-01-13 | |  61 |   69 |   0

Abstract

We consider the problem of learning control policies via trajectory preference queries to an expert. In particular, the agent presents an expert with short runs of a pair of policies originating from the same state and the expert indicates which trajectory is preferred. The agent’s goal is to elicit a latent target policy from the expert with as few queries as possible. To tackle this problem we propose a novel Bayesian model of the querying process and introduce two methods that exploit this model to actively select expert queries. Experimental results on four benchmark problems indicate that our model can effectively learn policies from trajectory preference queries and that active query selection can be substantially more efficient than random selection.

上一篇:Link Prediction in Graphs with Autoregressive Features

下一篇:Multi-scale Hyper-time Hardware Emulation of Human Motor Nervous System Based on Spiking Neurons using FPGA

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...