Five-minute Demo on CIFAR-10
Abstract
We propose the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision. To impose a hierarchical inclusion structure on latent variables, we incorporate a new architecture called the DTLC into the generator input. The DTLC has a multiple-layer tree structure in which the ON or OFF of the child node codes is controlled by the parent node codes. By using this architecture hierarchically, we can obtain the latent space in which the lower layer codes are selectively used depending on the higher layer ones. To make the latent codes capture salient semantic features of images in a hierarchically disentangled manner in the DTLC, we also propose a hierarchical conditional mutual information regularization (HCMI) and optimize it with a newly defined curriculum learning method that we propose as well. This makes it possible to discover hierarchically interpretable representations in a layer-by-layer manner on the basis of information gain by only using a single DTLC-GAN model. We evaluated the DTLC-GAN on various datasets, i.e., MNIST, CIFAR-10, Tiny ImageNet, 3D Faces, and CelebA, and confirmed that the DTLC-GAN can learn hierarchically interpretable representations with either unsupervised or weakly supervised settings. Furthermore, we applied the DTLC-GAN to image-retrieval tasks and showed its effectiveness in representation learning.
Motivation of DTLC-GAN
We address the problem of how to derive hierarchically interpretable representations in a deep generative model. To solve this problem, we propose the DTLC-GAN, an extension of the GAN that can learn hierarchically interpretable representations without relying on detailed supervision. Figure 1 shows examples of image generation under control using the DTLC-GAN. If semantic features are represented in a hierarchically disentangled manner, we can approach a target image gradually and interactively.
Relationship to Previous GANs
The DTLC-GAN is a general framework, and we can see it as a natural extension of previous GANs. In particular, the InfoGAN [4] and CFGAN [5] (our previous work [project page]) are highly related to the DTLC-GAN in terms of discovering hidden representations on the basis of information gain; however, they are limited to learning one-layer hidden representation. We developed our DTLC-GAN to overcome this limitation. We summarize the relationship in Table 1.
# of Hidden Layers | Unsupervised | (Weakly) Supervised |
---|---|---|
0 | GAN [1] | CGAN [2], AC-GAN [3] |
1 | InfoGAN [4] | CFGAN [5] |
2, 3, 4, ... | DTLC-GAN | DTLC-GANWS |
Demonstrations
One-minute Demo on MNIST
One-minute Demo on 3D Faces
One-minute Demo on CelebA
Implementations
Implementations of DTLC-GAN by our followers.
If you would like to add yours, please contact us.
References
[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, Generative Adversarial Nets. Advances in Neural Information Processing Systems (NIPS), 2014.
[2] Mehdi Mirza and Simon Osindero, Conditional Generative Adversarial Nets. arXiv, 2014.
[3] Augustus Odena, Christopher Olah, and Jonathon Shlens, Conditional Image Synthesis with Auxiliary Classifier GANs. International Conference on Machine Learning (ICML), 2017.
[4] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel, InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Advances in Neural Information Processing Systems (NIPS), 2016.
Contact
Takuhiro Kaneko
NTT Communication Science Laboratories, NTT Corporation
takuhiro.kaneko.tb at hco.ntt.co.jp