Truly Batch Apprenticeship Learning with Deep Successor Features ?

资源分类

2019-10-10 |

50 |

39 |

Abstract We introduce a novel Inverse Reinforcement Learning (IRL) method for batch settings where only expert demonstrations are given and no interaction with the environment is allowed. Such settings are common in health care, finance and education where environmental dynamics are unknown and no reliable simulator exists. Unlike existing IRL methods, our method does not require on-policy roll-outs or assume access to non-expert data. We introduce a robust epde off-policy estimator of feature expectations of any policy and also propose an IRL warm-start strategy that jointly learns a nearexpert initial policy and an expressive feature representation directly from data, both of which together render batch IRL feasible. We demonstrate our model’s superior performance in batch settings with both classical control tasks and a real-world clinical task of sepsis management in the ICU

上一篇：Thompson Sampling on Symmetric ?-Stable Bandits?

下一篇：A Goal-Driven Tree-Structured Neural Model for Math Word Problems

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com