Abstract
Generation of photo-realistic images, semantic editing and representation learning are only a few of many applications of high-resolution generative models. Recent progress in GANs have established them as an excellent choice for such tasks. However, since they do not provide an inference model, downstream tasks such as classification cannot be easily applied on real images using the GAN latent space. Despite numerous efforts to train an inference model or design an iterative method to invert a pre-trained generator, previous methods are dataset (e.g. human face images) and architecture (e.g. StyleGAN) specific. These methods are nontrivial to extend to novel datasets or architectures. We propose a general framework that is agnostic to architecture and datasets. Our key insight is that, by training the inference and the generative model together, we allow them to adapt to each other and to converge to a better quality model. Our InvGAN, short for Invertible GAN, successfully embeds real images in the latent space of a high quality generative model. This allows us to perform image inpainting, merging, interpolation and online data augmentation. We demonstrate this with extensive qualitative and quantitative experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Seamless color mapping for 3D reconstruction with consumer-grade scanning devices
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the styleGAN latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the styleGAN latent space? arXiv:1904.03189 (2019)
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: a residual-based styleGAN encoder via iterative refinement (2021)
Alaluf, Y., Tov, O., Mokady, R., Gal, R., Bermano, A.H.: Hyperstyle: StyleGAN inversion with hypernetworks for real image editing (2021). arXiv:2111.15666, https://doi.org/10.48550/ARXIV.2111.15666
Balakrishnan, G., Xiong, Y., Xia, W., Perona, P.: Towards causal benchmarking of bias in face analysis algorithms. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12363, pp. 547–563. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58523-5_32
Bau, D., Strobelt, H., Peebles, W., Zhou, B., Zhu, J.Y., Torralba, A., et al.: Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727 (2020)
Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C.J., Schoelkopf, B.: From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642 (2017)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv:1809.11096 (2018)
Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., Sutskever, I.: Generative pretraining from pixels. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, 13–18 Jul 2020, vol. 119, pp. 1691–1703. PMLR (2020). https://proceedings.mlr.press/v119/chen20s.html
Cheng, Y., Gan, Z., Li, Y., Liu, J., Gao, J.: Sequential attention GAN for interactive image editing. arXiv preprint arXiv:1812.08352 (2020)
Child, R.: Very deep VAEs generalize autoregressive models and can outperform them on images. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=RLRXCV6DbEJ
Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016)
Donahue, J., Simonyan, K.: Large scale adversarial representation learning. arXiv:1907.02544 (2019)
Dumoulin, V., et al.: Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2016)
Ghosh, P., Sajjadi, M.S.M., Vergari, A., Black, M.J., Schölkopf, B.: From variational to deterministic autoencoders. In: 8th International Conference on Learning Representations (ICLR) (2020). https://openreview.net/forum?id=S1g7tpEYDS
Ghosh, P., Gupta, P.S., Uziel, R., Ranjan, A., Black, M.J., Bolkart, T.: GIF: generative interpretable faces. In: International Conference on 3D Vision (3DV) (2020). http://gif.is.tue.mpg.de/
Ghosh, P., Losalka, A., Black, M.J.: Resisting adversarial attacks using gaussian mixture variational autoencoders. In: Proceedings AAAI Conference Artificial Intelligence, vol. 33, pp. 541–548 (2019). https://doi.org/10.1609/aaai.v33i01.3301541. https://ojs.aaai.org/index.php/AAAI/article/view/3828
Guan, S., Tai, Y., Ni, B., Zhu, F., Huang, F., Yang, X.: Collaborative learning for faster styleGAN embedding. arXiv:2007.01758 (2020)
Johnson, J., Alahi, A., Li, F.: Perceptual losses for real-time style transfer and super-resolution. arXiv:1603.08155 (2016)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8110–8119 (2020)
Lin, C.H., Chang, C., Chen, Y., Juan, D., Wei, W., Chen, H.: COCO-GAN: generation by parts via conditional coordinating. arXiv:1904.00284 (2019)
Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative adversarial networks. arXiv preprint arXiv:1702.04782 (2017)
Locatello, F., et al.: Challenging common assumptions in the unsupervised learning of disentangled representations. In: International Conference on Machine Learning, pp. 4114–4124. PMLR (2019)
Marriott, R.T., Madiouni, S., Romdhani, S., Gentric, S., Chen, L.: An assessment of GANs for identity-related applications. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10 (2020). https://doi.org/10.1109/IJCB48548.2020.9304879
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 7588–7597 (2019)
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional GANs for image editing. arXiv preprint arXiv:1611.06355 (2016)
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. arXiv:2004.04467 (2020)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Ramaswamy, V.V., Kim, S.S., Russakovsky, O.: Fair attribute classification through latent space de-biasing. arXiv preprint arXiv:2012.01469 (2020)
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. arXiv:2008.00951 (2020)
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. arXiv:1606.03498 (2016)
dos Santos Tanaka, F.H.K., Aranha, C.: Data augmentation using GANs. arXiv:1904.09135 (2019)
Sattigeri, P., Hoffman, S.C., Chenthamarakshan, V., Varshney, K.R.: Fairness GAN. arXiv preprint arXiv:1805.09910 (2018)
Sharmanska, V., Hendricks, L.A., Darrell, T., Quadrianto, N.: Contrastive examples for addressing the tyranny of the majority. arXiv preprint arXiv:2004.06524 (2020)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402 (2012)
Tewari, A., et al.: Stylerig: rigging StyleGAN for 3D control over portrait images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Tolstikhin, I., Bousquet, O., Gelly, S., Schoelkopf, B.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=HkL7n1-0b
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. arXiv preprint arXiv:2002.03754 (2020)
Wei, T., et al.: A simple baseline for StyleGAN inversion. arXiv:2104.07661 (2021)
Wulff, J., Torralba, A.: Improving inversion and generation diversity in StyleGAN using a gaussianized latent space. arXiv preprint arXiv:2009.06529 (2020)
Xia, W., Zhang, Y., Yang, Y., Xue, J.H., Zhou, B., Yang, M.H.: Gan inversion: a survey. arXiv preprint arXiv:2101.05278 (2021)
Xu, H., et al.: Adversarial attacks and defenses in images, graphs and text: a review. arXiv:1909.08072 (2019). https://doi.org/10.48550/ARXIV.1909.08072
Xu, Y., Shen, Y., Zhu, J., Yang, C., Zhou, B.: Generative hierarchical features from synthesizing images. In: CVPR (2021)
Yu, J., et al.: Vector-quantized image modeling with improved VQGAN. In: International Conference on Learning Representations (ICLR) (2022)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhu, Jiapeng, Shen, Yujun, Zhao, Deli, Zhou, Bolei: In-domain GAN inversion for real image editing. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 592–608. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_35
Zhu, J., Zhao, D., Zhang, B.: LIA: latently invertible autoencoder with adversarial learning. arXiv:1906.08090 (2019)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Zietlow, D., et al.: Leveling down in computer vision: pareto inefficiencies in fair deep classifiers. arXiv:2203.04913 (2022). https://doi.org/10.48550/ARXIV.2203.04913
Acknowledgement
We thank Alex Vorobiov, Javier Romero, Betty Mohler Tesch and Soubhik Sanyal for their insightful comments and intriguing discussions. While PG and DZ are affiliated with Max Planck Institute for Intelligent Systems, this project was completed during PG’s and DZ’s internship at Amazon. MJB performed this work while at Amazon.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ghosh, P., Zietlow, D., Black, M.J., Davis, L.S., Hu, X. (2022). InvGAN: Invertible GANs. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-16788-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)