Abstract
This paper proposes the decision tree latent controller
generative adversarial network (DTLC-GAN), an extension
of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision. To impose a hierarchical inclusion structure on latent variables,
we incorporate a new architecture called the DTLC into the
generator input. The DTLC has a multiple-layer tree structure in which the ON or OFF of the child node codes is
controlled by the parent node codes. By using this architecture hierarchically, we can obtain the latent space in which
the lower layer codes are selectively used depending on the
higher layer ones. To make the latent codes capture salient
semantic features of images in a hierarchically disentangled
manner in the DTLC, we also propose a hierarchical conditional mutual information regularization and optimize it
with a newly defined curriculum learning method that we
propose as well. This makes it possible to discover hierarchically interpretable representations in a layer-by-layer
manner on the basis of information gain by only using a
single DTLC-GAN model. We evaluated the DTLC-GAN
on various datasets, i.e., MNIST, CIFAR-10, Tiny ImageNet,
3D Faces, and CelebA, and confirmed that the DTLC-GAN
can learn hierarchically interpretable representations with
either unsupervised or weakly supervised settings. Furthermore, we applied the DTLC-GAN to image-retrieval tasks
and showed its effectiveness in representation learning