Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Five-minute Demo on CIFAR-10

Paper

Takuhiro Kaneko, Kaoru Hiramatsu, and Kunio Kashino, Generative Adversarial Image Synthesis with Decision Tree Latent Controller. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[Paper] [Poster] [BibTex]

Abstract

We propose the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision. To impose a hierarchical inclusion structure on latent variables, we incorporate a new architecture called the DTLC into the generator input. The DTLC has a multiple-layer tree structure in which the ON or OFF of the child node codes is controlled by the parent node codes. By using this architecture hierarchically, we can obtain the latent space in which the lower layer codes are selectively used depending on the higher layer ones. To make the latent codes capture salient semantic features of images in a hierarchically disentangled manner in the DTLC, we also propose a hierarchical conditional mutual information regularization (HCMI) and optimize it with a newly defined curriculum learning method that we propose as well. This makes it possible to discover hierarchically interpretable representations in a layer-by-layer manner on the basis of information gain by only using a single DTLC-GAN model. We evaluated the DTLC-GAN on various datasets, i.e., MNIST, CIFAR-10, Tiny ImageNet, 3D Faces, and CelebA, and confirmed that the DTLC-GAN can learn hierarchically interpretable representations with either unsupervised or weakly supervised settings. Furthermore, we applied the DTLC-GAN to image-retrieval tasks and showed its effectiveness in representation learning.

Motivation of DTLC-GAN

We address the problem of how to derive hierarchically interpretable representations in a deep generative model. To solve this problem, we propose the DTLC-GAN, an extension of the GAN that can learn hierarchically interpretable representations without relying on detailed supervision. Figure 1 shows examples of image generation under control using the DTLC-GAN. If semantic features are represented in a hierarchically disentangled manner, we can approach a target image gradually and interactively.

concept — Figure 1. Example of image generation under control using DTLC-GAN: DTLC-GAN enables image generation to be controlled in coarse-to-fine manner, i.e., "selected and narrowed." Our goal is to discover such hierarchically interpretable representations without relying on detailed supervision.

Relationship to Previous GANs

The DTLC-GAN is a general framework, and we can see it as a natural extension of previous GANs. In particular, the InfoGAN [4] and CFGAN [5] (our previous work [project page]) are highly related to the DTLC-GAN in terms of discovering hidden representations on the basis of information gain; however, they are limited to learning one-layer hidden representation. We developed our DTLC-GAN to overcome this limitation. We summarize the relationship in Table 1.

Table 1. Relationship to previous GANs: We denote DTLC-GAN in weakly supervised setting as DTLC-GAN_WS
# of Hidden Layers	Unsupervised	(Weakly) Supervised
0	GAN [1]	CGAN [2], AC-GAN [3]
1	InfoGAN [4]	CFGAN [5]
2, 3, 4, ...	DTLC-GAN	DTLC-GAN_WS

Demonstrations

One-minute Demo on MNIST

One-minute Demo on 3D Faces

One-minute Demo on CelebA

Implementations

Implementations of DTLC-GAN by our followers.

[Tensorflow] by Zhenliang He

If you would like to add yours, please contact us.

References

[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, Generative Adversarial Nets. Advances in Neural Information Processing Systems (NIPS), 2014.

[2] Mehdi Mirza and Simon Osindero, Conditional Generative Adversarial Nets. arXiv, 2014.

[3] Augustus Odena, Christopher Olah, and Jonathon Shlens, Conditional Image Synthesis with Auxiliary Classifier GANs. International Conference on Machine Learning (ICML), 2017.

[4] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel, InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. Advances in Neural Information Processing Systems (NIPS), 2016.

Please check out our previous work!

[5] Takuhiro Kaneko, Kaoru Hiramatsu, and Kunio Kashino, Generative Attribute Controller with Conditional Filtered Generative Adversarial Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [Project]

Contact

Takuhiro Kaneko
NTT Communication Science Laboratories, NTT Corporation
takuhiro.kaneko.tb at hco.ntt.co.jp