Abstract
Imitation learning targets deriving a mapping from
states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure
in real applications. Though algorithms for learning with unobservable actions have been proposed,
they focus solely on state information and overlook the fact that the action sequence could still
be partially available and provide useful information for policy deriving. In this paper, we propose
a novel algorithm called Action-Guided Adversarial Imitation Learning (AGAIL) that learns a policy from demonstrations with incomplete action sequences, i.e., incomplete demonstrations. The core
idea of AGAIL is to separate demonstrations into
state and action trajectories, and train a policy with
state trajectories while using actions as auxiliary
information to guide the training whenever applicable. Built upon the Generative Adversarial Imitation Learning, AGAIL has three components: a
generator, a discriminator, and a guide. The generator learns a policy with rewards provided by
the discriminator, which tries to distinguish state
distributions between demonstrations and samples
generated by the policy. The guide provides additional rewards to the generator when demonstrated
actions for specific states are available. We compare AGAIL to other methods on benchmark tasks
and show that AGAIL consistently delivers comparable performance to the state-of-the-art methods
even when the action sequence in demonstrations is
only partially available