Abstract
Classically, imitation learning algorithms have
been developed for idealized situations, e.g., the
demonstrations are often required to be collected
in the exact same environment and usually include
the demonstrator’s actions. Recently, however, the
research community has begun to address some of
these shortcomings by offering algorithmic solutions that enable imitation learning from observation (IfO), e.g., learning to perform a task from visual demonstrations that may be in a different environment and do not include actions. Motivated by
the fact that agents often also have access to their
own internal states (i.e., proprioception), we propose and study an IfO algorithm that leverages this
information in the policy learning process. The proposed architecture learns policies over proprioceptive state representations and compares the resulting trajectories visually to the demonstration data.
We experimentally test the proposed technique on
several MuJoCo domains and show that it outperforms other imitation from observation algorithms
by a large margin