Abstract
Many studies have shown that generative adversarial networks (GANs) can discover semantics at various levels of abstraction, yet GANs do not provide an intuitive way to show how they understand and control semantics. In order to identify interpretable directions in GAN’s latent space, both supervised and unsupervised approaches have been proposed. But the supervised methods can only find the directions consistent with the supervised conditions. However, many current unsupervised methods are hampered by varying degrees of semantic property disentanglement. This paper proposes an unsupervised method with a layer-wise design. The model embeds subspace in each generator layer to capture the disentangled interpretable semantics in GAN. And the research also introduces a latent mapping network to map the inputs to an intermediate latent space with rich disentangled semantics. Additionally, the paper applies an Orthogonal Jacobian regularization to the model to impose constraints on the overall input, further enhancing disentanglement. Experiments demonstrate the method’s applicability in the human face, anime face, and scene datasets and its efficacy in finding interpretable directions. Compared with existing unsupervised methods in both qualitative and quantitative aspects, this study proposed method achieves excellent improvement in the disentanglement effect.
H. Hu, X. Zhou, X. Huo and B. Zhang—Contributing authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of styleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Bau, D., Zhu, J.-Y., Strobelt, H., Lapedriza, A., Zhou, B., Torralba, A.: Understanding the role of individual units in a deep neural network. Proc. Natl. Acad. Sci. 117(48), 30071–30078 (2020)
Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2131–2145 (2018)
Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014)
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
Yang, C., Shen, Y., Zhou, B.: Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vision 129(5), 1451–1466 (2021). https://doi.org/10.1007/s11263-020-01429-5
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
Goetschalckx, L., Andonian, A., Oliva, A., Isola, P.: GANalyze: toward visual definitions of cognitive image properties. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5744–5753 (2019)
Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13799–13808 (2021)
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of styleGAN imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094 (2021)
Couairon, G., Grechka, A., Verbeek, J., Schwenk, H., Cord, M.: FlexIT: towards flexible semantic image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18270–18279 (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Yao, X., Newson, A., Gousseau, Y., Hellier, P.: A latent transformer for disentangled face editing in images and videos. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13789–13798 (2021)
Shoshan, A., Bhonker, N., Kviatkovsky, I., Medioni, G.: GAN-control: explicitly controllable GANs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14083–14093 (2021)
Lang, O., et al.: Explaining in style: training a GAN to explain a classifier in StyleSpace. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 693–702 (2021)
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANspace: discovering interpretable GAN controls. Adv. Neural. Inf. Process. Syst. 33, 9841–9850 (2020)
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1532–1540 (2021)
He, Z., Kan, M., Shan, S.: EigenGAN: layer-wise eigen-learning for GANs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14408–14417 (2021)
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., Cohen-Or, D.: Designing an encoder for styleGAN image manipulation. ACM Trans. Graph. (TOG) 40(4), 1–14 (2021)
Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for styleGAN image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12863–12872 (2021)
Wei, Y., et al: Orthogonal jacobian regularization for unsupervised disentanglement in image generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6721–6730 (2021)
Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? In: International Conference on Machine Learning, pp. 3481–3490. PMLR (2018)
Nowozin, S., Cseke, B., Tomioka, R.: f-GAN: training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: StyleFlow: attribute-conditioned exploration of styleGAN-generated images using conditional continuous normalizing flows. ACM Trans. Graph. (TOG) 40(3), 1–21 (2021)
Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2004–2018 (2020)
Cherepkov, A., Voynov, A., Babenko, A.: Navigating the GAN parameter space for semantic image editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3671–3680 (2021)
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. In: International Conference on Machine Learning, pp. 9786–9796. PMLR (2020)
Tzelepis, C., Tzimiropoulos, G., Patras, I.: WarpedGANSpace: finding nonlinear RBF paths in GAN latent space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6393–6402 (2021)
Yüksel, O.K., Simsar, E., Er, E.G., Yanardag, P.: LatentCLR: a contrastive learning approach for unsupervised discovery of interpretable directions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14263–14272 (2021)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Peebles, W., Peebles, J., Zhu, J.-Y., Efros, A., Torralba, A.: The hessian penalty: a weak prior for unsupervised disentanglement. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 581–597. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_35
Zhu, X., Xu, C., Tao, D.: Learning disentangled representations with latent variation predictability. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 684–700. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_40
Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Branwen, G., Gokaslan, A.: Danbooru 2019: a large-scale crowdsourced and tagged anime illustration dataset (2019)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
Liang, J., Zeng, H., Zhang, L.: Details or artifacts: a locally discriminative learning approach to realistic image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5657–5666 (2022)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Wulff, J., Torralba, A.: Improving inversion and generation diversity in StyleGAN using a gaussianized latent space. arXiv preprint arXiv:2009.06529 (2020)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under grant 62072169, and Natural Science Foundation of Hunan Province under grant 2021JJ30138.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, H., Jiang, B., Zhou, X., Huo, X., Zhang, B. (2022). Unsupervised Discovery of Disentangled Interpretable Directions for Layer-Wise GAN. In: Li, T., et al. Big Data. BigData 2022. Communications in Computer and Information Science, vol 1709. Springer, Singapore. https://doi.org/10.1007/978-981-19-8331-3_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-8331-3_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8330-6
Online ISBN: 978-981-19-8331-3
eBook Packages: Computer ScienceComputer Science (R0)