Abstract
This paper proposes a method to modify a traditional
convolutional neural network (CNN) into an interpretable
CNN, in order to clarify knowledge representations in high
conv-layers of the CNN. In an interpretable CNN, each filter in a high conv-layer represents a specific object part.
Our interpretable CNNs use the same training data as ordinary CNNs without a need for any annotations of object
parts or textures for supervision. The interpretable CNN
automatically assigns each filter in a high conv-layer with
an object part during the learning process. We can apply
our method to different types of CNNs with various structures. The explicit knowledge representation in an interpretable CNN can help people understand the logic inside
a CNN, i.e. what patterns are memorized by the CNN for
prediction. Experiments have shown that filters in an interpretable CNN are more semantically meaningful than those
in a traditional CNN. The code is available at https:
//github.com/zqs1022/interpretableCNN.