Abstract
Distribution and sample models are two popular
model choices in model-based reinforcement learning (MBRL). However, learning these models can
be intractable, particularly when the state and action spaces are large. Expectation models, on the
other hand, are relatively easier to learn due to their
compactness and have also been widely used for
deterministic environments. For stochastic environments, it is not obvious how expectation models
can be used for planning as they only partially characterize a distribution. In this paper, we propose a
sound way of using expectation models for MBRL.
In particular, we 1) show that planning with an expectation model is equivalent to planning with a
distribution model if the state value function is linear in state-feature vector, 2) analyze two common
parametrization choices for approximating the expectation: linear and non-linear expectation models, 3) propose a sound model-based policy evaluation algorithm and present its convergence results,
and 4) empirically demonstrate the effectiveness of
the proposed planning algorithm