Abstract
AI agents support high stakes decision-making processes from driving cars to prescribing drugs, making it increasingly important for human users to
understand their behavior. Policy summarization
methods aim to convey strengths and weaknesses
of such agents by demonstrating their behavior in
a subset of informative states. Some policy summarization methods extract a summary that optimizes the ability to reconstruct the agent’s policy
under the assumption that users will deploy inverse
reinforcement learning. In this paper, we explore
the use of different models for extracting summaries. We introduce an imitation learning-based
approach to policy summarization; we demonstrate
through computational simulations that a mismatch
between the model used to extract a summary and
the model used to reconstruct the policy results in
worse reconstruction quality; and we demonstrate
through a human-subject study that people use different models to reconstruct policies in different
contexts, and that matching the summary extraction
model to these can improve performance. Together,
our results suggest that it is important to carefully
consider user models in policy summarization