InvGAN: Invertible GANs

Ghosh, Partha; Zietlow, Dominik; Black, Michael J.; Davis, Larry S.; Hu, Xiaochen

doi:10.1007/978-3-031-16788-1_1

Partha Ghosh¹³,
Dominik Zietlow^13,14,
Michael J. Black^13,14,
Larry S. Davis¹⁴ &
…
Xiaochen Hu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13485))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

1589 Accesses
3 Citations

Abstract

Generation of photo-realistic images, semantic editing and representation learning are only a few of many applications of high-resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, downstream tasks such as classification cannot be easily applied on real images using the GAN latent space. Despite numerous efforts to train an inference model or design an iterative method to invert a pre-trained generator, previous methods are dataset (e.g. human face images) and architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend to novel datasets or architectures. We propose a general framework that is agnostic to architecture and datasets. Our key insight is that, by training the inference and the generative model together, we allow them to adapt to each other and to converge to a better quality model. Our InvGAN, short for Invertible GAN, successfully embeds real images in the latent space of a high quality generative model. This allows us to perform image inpainting, merging, interpolation and online data augmentation. We demonstrate this with extensive qualitative and quantitative experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Except for BiGAN [14] and ALI [16]. We discuss the differences in Sect. 2.

References

Seamless color mapping for 3D reconstruction with consumer-grade scanning devices
Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the styleGAN latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019)
Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the styleGAN latent space? arXiv:1904.03189 (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: a residual-based styleGAN encoder via iterative refinement (2021)
Google Scholar
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.H.: Hyperstyle: StyleGAN inversion with hypernetworks for real image editing (2021). arXiv:2111.15666, https://doi.org/10.48550/ARXIV.2111.15666
Balakrishnan, G., Xiong, Y., Xia, W., Perona, P.: Towards causal benchmarking of bias in face analysis algorithms. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 547–563. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_32
Chapter Google Scholar
Bau, D., Strobelt, H., Peebles, W., Zhou, B., Zhu, J.Y., Torralba, A., et al.: Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727 (2020)
Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C.J., Schoelkopf, B.: From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642 (2017)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv:1809.11096 (2018)
Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., Sutskever, I.: Generative pretraining from pixels. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, 13–18 Jul 2020, vol. 119, pp. 1691–1703. PMLR (2020). https://proceedings.mlr.press/v119/chen20s.html
Cheng, Y., Gan, Z., Li, Y., Liu, J., Gao, J.: Sequential attention GAN for interactive image editing. arXiv preprint arXiv:1812.08352 (2020)
Child, R.: Very deep VAEs generalize autoregressive models and can outperform them on images. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=RLRXCV6DbEJ
Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016)
Donahue, J., Simonyan, K.: Large scale adversarial representation learning. arXiv:1907.02544 (2019)
Dumoulin, V., et al.: Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016)
Ghosh, P., Sajjadi, M.S.M., Vergari, A., Black, M.J., Schölkopf, B.: From variational to deterministic autoencoders. In: 8th International Conference on Learning Representations (ICLR) (2020). https://openreview.net/forum?id=S1g7tpEYDS
Ghosh, P., Gupta, P.S., Uziel, R., Ranjan, A., Black, M.J., Bolkart, T.: GIF: generative interpretable faces. In: International Conference on 3D Vision (3DV) (2020). http://gif.is.tue.mpg.de/
Ghosh, P., Losalka, A., Black, M.J.: Resisting adversarial attacks using gaussian mixture variational autoencoders. In: Proceedings AAAI Conference Artificial Intelligence, vol. 33, pp. 541–548 (2019). https://doi.org/10.1609/aaai.v33i01.3301541. https://ojs.aaai.org/index.php/AAAI/article/view/3828
Guan, S., Tai, Y., Ni, B., Zhu, F., Huang, F., Yang, X.: Collaborative learning for faster styleGAN embedding. arXiv:2007.01758 (2020)
Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. arXiv:1603.08155 (2016)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4401–4410 (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8110–8119 (2020)
Google Scholar
Lin, C.H., Chang, C., Chen, Y., Juan, D., Wei, W., Chen, H.: COCO-GAN: generation by parts via conditional coordinating. arXiv:1904.00284 (2019)
Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative adversarial networks. arXiv preprint arXiv:1702.04782 (2017)
Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: International Conference on Machine Learning, pp. 4114–4124. PMLR (2019)
Google Scholar
Marriott, R.T., Madiouni, S., Romdhani, S., Gentric, S., Chen, L.: An assessment of GANs for identity-related applications. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10 (2020). https://doi.org/10.1109/IJCB48548.2020.9304879
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7588–7597 (2019)
Google Scholar
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional GANs for image editing. arXiv preprint arXiv:1611.06355 (2016)
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. arXiv:2004.04467 (2020)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Ramaswamy, V.V., Kim, S.S., Russakovsky, O.: Fair attribute classification through latent space de-biasing. arXiv preprint arXiv:2012.01469 (2020)
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
Google Scholar
Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. arXiv:2008.00951 (2020)
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. arXiv:1606.03498 (2016)
dos Santos Tanaka, F.H.K., Aranha, C.: Data augmentation using GANs. arXiv:1904.09135 (2019)
Sattigeri, P., Hoffman, S.C., Chenthamarakshan, V., Varshney, K.R.: Fairness GAN. arXiv preprint arXiv:1805.09910 (2018)
Sharmanska, V., Hendricks, L.A., Darrell, T., Quadrianto, N.: Contrastive examples for addressing the tyranny of the majority. arXiv preprint arXiv:2004.06524 (2020)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 (2012)
Tewari, A., et al.: Stylerig: rigging StyleGAN for 3D control over portrait images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Google Scholar
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HkL7n1-0b
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. arXiv preprint arXiv:2002.03754 (2020)
Wei, T., et al.: A simple baseline for StyleGAN inversion. arXiv:2104.07661 (2021)
Wulff, J., Torralba, A.: Improving inversion and generation diversity in StyleGAN using a gaussianized latent space. arXiv preprint arXiv:2009.06529 (2020)
Xia, W., Zhang, Y., Yang, Y., Xue, J.H., Zhou, B., Yang, M.H.: Gan inversion: a survey. arXiv preprint arXiv:2101.05278 (2021)
Xu, H., et al.: Adversarial attacks and defenses in images, graphs and text: a review. arXiv:1909.08072 (2019). https://doi.org/10.48550/ARXIV.1909.08072
Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: CVPR (2021)
Google Scholar
Yu, J., et al.: Vector-quantized image modeling with improved VQGAN. In: International Conference on Learning Representations (ICLR) (2022)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhu, Jiapeng, Shen, Yujun, Zhao, Deli, Zhou, Bolei: In-domain GAN inversion for real image editing. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_35
Chapter Google Scholar
Zhu, J., Zhao, D., Zhang, B.: LIA: latently invertible autoencoder with adversarial learning. arXiv:1906.08090 (2019)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Chapter Google Scholar
Zietlow, D., et al.: Leveling down in computer vision: pareto inefficiencies in fair deep classifiers. arXiv:2203.04913 (2022). https://doi.org/10.48550/ARXIV.2203.04913

Download references

Acknowledgement

We thank Alex Vorobiov, Javier Romero, Betty Mohler Tesch and Soubhik Sanyal for their insightful comments and intriguing discussions. While PG and DZ are affiliated with Max Planck Institute for Intelligent Systems, this project was completed during PG’s and DZ’s internship at Amazon. MJB performed this work while at Amazon.

Author information

Authors and Affiliations

MPI for Intelligent Systems, Tübingen, Germany
Partha Ghosh, Dominik Zietlow & Michael J. Black
Amazon.com, Inc., Seattle, USA
Dominik Zietlow, Michael J. Black, Larry S. Davis & Xiaochen Hu

Authors

Partha Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Zietlow
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Black
View author publications
You can also search for this author in PubMed Google Scholar
Larry S. Davis
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Partha Ghosh .

Editor information

Editors and Affiliations

TU Dresden, Dresden, Germany
Björn Andres
University of Bonn, Bonn, Germany
Florian Bernard
Technical University of Munich, Munich, Germany
Daniel Cremers
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Konstanz, Konstanz, Germany
Bastian Goldlücke
University of Siegen, Siegen, Germany
Ivo Ihrke

1 Electronic supplementary material

Supplementary material 1 (pdf 6851 KB)

Supplementary material 2 (avi 339 KB)

Supplementary material 3 (avi 391 KB)

Supplementary material 4 (avi 738 KB)

Supplementary material 5 (avi 308 KB)

Supplementary material 6 (avi 350 KB)

Supplementary material 7 (avi 564 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, P., Zietlow, D., Black, M.J., Davis, L.S., Hu, X. (2022). InvGAN: Invertible GANs. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-16788-1_1
Published: 20 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics