Unsupervised Discovery of Disentangled Interpretable Directions for Layer-Wise GAN

Hu, Haotian; Jiang, Bin; Zhou, Xinjiao; Huo, Xiaofei; Zhang, Bolin

doi:10.1007/978-981-19-8331-3_2

Haotian Hu¹³,
Bin Jiang^13,14,
Xinjiao Zhou¹³,
Xiaofei Huo¹³ &
…
Bolin Zhang¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1709))

Included in the following conference series:

CCF Conference on Big Data

510 Accesses

Abstract

Many studies have shown that generative adversarial networks (GANs) can discover semantics at various levels of abstraction, yet GANs do not provide an intuitive way to show how they understand and control semantics. In order to identify interpretable directions in GAN’s latent space, both supervised and unsupervised approaches have been proposed. But the supervised methods can only find the directions consistent with the supervised conditions. However, many current unsupervised methods are hampered by varying degrees of semantic property disentanglement. This paper proposes an unsupervised method with a layer-wise design. The model embeds subspace in each generator layer to capture the disentangled interpretable semantics in GAN. And the research also introduces a latent mapping network to map the inputs to an intermediate latent space with rich disentangled semantics. Additionally, the paper applies an Orthogonal Jacobian regularization to the model to impose constraints on the overall input, further enhancing disentanglement. Experiments demonstrate the method’s applicability in the human face, anime face, and scene datasets and its efficacy in finding interpretable directions. Compared with existing unsupervised methods in both qualitative and quantitative aspects, this study proposed method achieves excellent improvement in the disentanglement effect.

H. Hu, X. Zhou, X. Huo and B. Zhang—Contributing authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploring Gradient-Based Multi-directional Controls in GANs

Semantic-Aware GAN Manipulations for Human Face Editing

Discriminator Feature-Based Inference by Recycling the Discriminator of GANs

Article 04 March 2020

References

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of styleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Bau, D., Zhu, J.-Y., Strobelt, H., Lapedriza, A., Zhou, B., Torralba, A.: Understanding the role of individual units in a deep neural network. Proc. Natl. Acad. Sci. 117(48), 30071–30078 (2020)
Article Google Scholar
Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2131–2145 (2018)
Article Google Scholar
Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014)
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
Google Scholar
Yang, C., Shen, Y., Zhou, B.: Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vision 129(5), 1451–1466 (2021). https://doi.org/10.1007/s11263-020-01429-5
Article Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
Google Scholar
Goetschalckx, L., Andonian, A., Oliva, A., Isola, P.: GANalyze: toward visual definitions of cognitive image properties. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5744–5753 (2019)
Google Scholar
Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13799–13808 (2021)
Google Scholar
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of styleGAN imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094 (2021)
Google Scholar
Couairon, G., Grechka, A., Verbeek, J., Schwenk, H., Cord, M.: FlexIT: towards flexible semantic image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18270–18279 (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13789–13798 (2021)
Google Scholar
Shoshan, A., Bhonker, N., Kviatkovsky, I., Medioni, G.: GAN-control: explicitly controllable GANs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14083–14093 (2021)
Google Scholar
Lang, O., et al.: Explaining in style: training a GAN to explain a classifier in StyleSpace. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 693–702 (2021)
Google Scholar
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANspace: discovering interpretable GAN controls. Adv. Neural. Inf. Process. Syst. 33, 9841–9850 (2020)
Google Scholar
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1532–1540 (2021)
Google Scholar
He, Z., Kan, M., Shan, S.: EigenGAN: layer-wise eigen-learning for GANs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14408–14417 (2021)
Google Scholar
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for styleGAN image manipulation. ACM Trans. Graph. (TOG) 40(4), 1–14 (2021)
Article Google Scholar
Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for styleGAN image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)
Google Scholar
Wei, Y., et al: Orthogonal jacobian regularization for unsupervised disentanglement in image generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6721–6730 (2021)
Google Scholar
Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, pp. 3481–3490. PMLR (2018)
Google Scholar
Nowozin, S., Cseke, B., Tomioka, R.: f-GAN: training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Google Scholar
Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: StyleFlow: attribute-conditioned exploration of styleGAN-generated images using conditional continuous normalizing flows. ACM Trans. Graph. (TOG) 40(3), 1–21 (2021)
Article Google Scholar
Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2004–2018 (2020)
Article Google Scholar
Cherepkov, A., Voynov, A., Babenko, A.: Navigating the GAN parameter space for semantic image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3671–3680 (2021)
Google Scholar
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. In: International Conference on Machine Learning, pp. 9786–9796. PMLR (2020)
Google Scholar
Tzelepis, C., Tzimiropoulos, G., Patras, I.: WarpedGANSpace: finding nonlinear RBF paths in GAN latent space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6393–6402 (2021)
Google Scholar
Yüksel, O.K., Simsar, E., Er, E.G., Yanardag, P.: LatentCLR: a contrastive learning approach for unsupervised discovery of interpretable directions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14263–14272 (2021)
Google Scholar
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Peebles, W., Peebles, J., Zhu, J.-Y., Efros, A., Torralba, A.: The hessian penalty: a weak prior for unsupervised disentanglement. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_35
Chapter Google Scholar
Zhu, X., Xu, C., Tao, D.: Learning disentangled representations with latent variation predictability. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 684–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_40
Chapter Google Scholar
Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Branwen, G., Gokaslan, A.: Danbooru 2019: a large-scale crowdsourced and tagged anime illustration dataset (2019)
Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Google Scholar
Liang, J., Zeng, H., Zhang, L.: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5657–5666 (2022)
Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Google Scholar
Wulff, J., Torralba, A.: Improving inversion and generation diversity in StyleGAN using a gaussianized latent space. arXiv preprint arXiv:2009.06529 (2020)

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under grant 62072169, and Natural Science Foundation of Hunan Province under grant 2021JJ30138.

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
Haotian Hu, Bin Jiang, Xinjiao Zhou, Xiaofei Huo & Bolin Zhang
Key Laboratory for Embedded and Network Computing of Hunan Province, Hunan University, Changsha, 410082, Hunan, China
Bin Jiang

Authors

Haotian Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xinjiao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Huo
View author publications
You can also search for this author in PubMed Google Scholar
Bolin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Jiang .

Editor information

Editors and Affiliations

Southwest Jiaotong University, Chengdu, China
Tianrui Li
Shenzhen University, Shenzhen, China
Rui Mao
Chongqing University of Posts and Telecommunications, Chongqing, China
Guoyin Wang
Southwest Jiaotong University, Chengdu, China
Fei Teng
Sichuan University, Chengdu, China
Lei Zhang
Taiyuan University of Technology, Taiyuan, China
Li Wang
Shenzhen University, Shenzhen, China
Laizhong Cui
Southwest Jiaotong University, Chengdu, China
Jie Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, H., Jiang, B., Zhou, X., Huo, X., Zhang, B. (2022). Unsupervised Discovery of Disentangled Interpretable Directions for Layer-Wise GAN. In: Li, T., et al. Big Data. BigData 2022. Communications in Computer and Information Science, vol 1709. Springer, Singapore. https://doi.org/10.1007/978-981-19-8331-3_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-8331-3_2
Published: 23 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8330-6
Online ISBN: 978-981-19-8331-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)