Abstract. We introduce a new method for interpreting computer vision models: visually perceptible, decision-boundary crossing transformations. Our goal is to answer a simple question: why did a model classify an image as being of class A instead of class B? Existing approaches
to model interpretation, including saliency and explanation-by-nearest
neighbor, fail to visually illustrate examples of transformations required
for a specific input to alter a model’s prediction. On the other hand, algorithms for creating decision-boundary crossing transformations (e.g.,
adversarial examples) produce differences that are visually imperceptible and do not enable insightful explanation. To address this we introduce ExplainGAN, a generative model that produces visually perceptible
decision-boundary crossing transformations. These transformations provide high-level conceptual insights which illustrate how a model makes
decisions. We validate our model using both traditional quantitative interpretation metrics and introduce a new validation scheme for our approach and generative models more generally.