Abstract. When intelligent agents learn visuomotor behaviors from human demonstrations, they may benefit from knowing where the human
is allocating visual attention, which can be inferred from their gaze. A
wealth of information regarding intelligent decision making is conveyed
by human gaze allocation; hence, exploiting such information has the
potential to improve the agents’ performance. With this motivation, we
propose the AGIL (Attention Guided Imitation Learning) framework.
We collect high-quality human action and gaze data while playing Atari
games in a carefully controlled experimental setting. Using these data, we
first train a deep neural network that can predict human gaze positions
and visual attention with high accuracy (the gaze network) and then
train another network to predict human actions (the policy network).
Incorporating the learned attention model from the gaze network into
the policy network significantly improves the action prediction accuracy
and task performance