Abstract. Explanations of the decisions made by a deep neural network
are important for human end-users to be able to understand and diagnose the trustworthiness of the system. Current neural networks used
for visual recognition are generally used as black boxes that do not
provide any human interpretable justification for a prediction. In this
work we propose a new framework called Interpretable Basis Decomposition for providing visual explanations for classification networks. By
decomposing the neural activations of the input image into semantically
interpretable components pre-trained from a large concept corpus, the
proposed framework is able to disentangle the evidence encoded in the
activation feature vector, and quantify the contribution of each piece of
evidence to the final prediction. We apply our framework for providing
explanations to several popular networks for visual recognition, and show
it is able to explain the predictions given by the networks in a humaninterpretable way. The human interpretability of the visual explanations
provided by our framework and other recent explanation methods is evaluated through Amazon Mechanical Turk, showing that our framework
generates more faithful and interpretable explanations