Abstract
We introduce a novel Inverse Reinforcement Learning (IRL) method for batch settings where only expert demonstrations are given and no interaction
with the environment is allowed. Such settings
are common in health care, finance and education
where environmental dynamics are unknown and
no reliable simulator exists. Unlike existing IRL
methods, our method does not require on-policy
roll-outs or assume access to non-expert data. We
introduce a robust epde off-policy estimator of feature expectations of any policy and also propose an
IRL warm-start strategy that jointly learns a nearexpert initial policy and an expressive feature representation directly from data, both of which together render batch IRL feasible. We demonstrate
our model’s superior performance in batch settings
with both classical control tasks and a real-world
clinical task of sepsis management in the ICU