Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in
Deep Neural Networks
Abstract
In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to
individual neural network filter responses, where interesting correlations are often found, largely by focusing on extremal filter responses. In this paper, we show that this approach can favor easy-to-interpret cases that are not necessarily representative of the average behavior of a representation.
A more realistic but harder-to-study hypothesis is that semantic representations are distributed, and thus filters must
be studied in conjunction. In order to investigate this idea
while enabling systematic visualization and quantification
of multiple filter responses, we introduce the Net2Vec framework, in which semantic concepts are mapped to vectorial
embeddings based on corresponding filter responses. By
studying such embeddings, we are able to show that 1., in
most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help
encode multiple concepts, and that 3., compared to single
filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship
to other concepts