Contrastive Learning for Diverse Disentangled Foreground Generation

Li, Yuheng; Li, Yijun; Lu, Jingwan; Shechtman, Eli; Lee, Yong Jae; Singh, Krishna Kumar

doi:10.1007/978-3-031-19787-1_19

Yuheng Li^12,13,
Yijun Li¹³,
Jingwan Lu¹³,
Eli Shechtman¹³,
Yong Jae Lee¹² &
…
Krishna Kumar Singh¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

European Conference on Computer Vision

2824 Accesses

Abstract

We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (“known”), and the other controls the remaining factors (“unknown”). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FurryGAN: High Quality Foreground-Aware Image Synthesis

Semi-supervised FusedGAN for Conditional Image Generation

Thinking Outside the BBox: Unconstrained Generative Object Compositing

References

https://github.com/zllrunning/face-parsing.pytorch
Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001). https://doi.org/10.1109/83.935036
Article MathSciNet MATH Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. In: SIGGRAPH 2009 (2009)
Google Scholar
Bertalmío, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting, pp. 417–424 (2000)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. arXiv:abs/2002.05709 (2020)
Deng, J., Guo, J., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694 (2019)
Google Scholar
Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. arXiv:abs/1705.10915 (2017)
Ding, D., Ram, S., Rodríguez, J.J.: Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans. Image Process. 28, 1705–1719 (2019)
Article MathSciNet Google Scholar
Du, R., et al.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 153–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_10
Chapter Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14134–14143, October 2021
Google Scholar
Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: SIGGRAPH 2007 (2007)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735 (2020)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 386–397 (2020)
Article Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS (2017)
Google Scholar
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Jilin Li, F.H.: CurricularFace: adaptive curriculum learning loss for deep face recognition, pp. 1–8 (2020)
Google Scholar
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANspace: Discovering interpretable GAN controls. In: Proceedings of NeurIPS (2020)
Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36, 1–14 (2017)
Article Google Scholar
Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations (2020)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2018)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of CVPR (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2014)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-2013), Sydney, Australia (2013)
Google Scholar
Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y.J., Singh, K.K.: Collaging class-specific GANs for semantic image synthesis. In: ICCV (2021)
Google Scholar
Li, Y., Singh, K.K., Ojha, U., Lee, Y.J.: MixnMatch: multifactor disentanglement and encoding for conditional image generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8036–8045 (2020)
Google Scholar
Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_6
Chapter Google Scholar
Ma, X., Zhou, X., Huang, H., Chai, Z., Wei, X., He, R.: Free-form image inpainting via contrastive attention network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9242–9249 (2021)
Google Scholar
Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6706–6716 (2020)
Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)
Google Scholar
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016)
Google Scholar
Ren, Y., Yu, X., Zhang, R., Li, T.H., Liu, S., Li, G.: Structureflow: image inpainting via structure-aware appearance flow. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 181–190 (2019)
Google Scholar
Sagong, M.C., Shin, Y.G., Kim, S.W., Park, S., Ko, S.: Pepsi : fast image inpainting with parallel decoding network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11352–11360 (2019)
Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9240–9249 (2020)
Google Scholar
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. arXiv:abs/2007.06600 (2020)
Shu, Z., Sahasrabudhe, M., Alp Güler, R., Samaras, D., Paragios, N., Kokkinos, I.: Deforming autoencoders: unsupervised disentangling of shape and appearance. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 664–680. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_40
Chapter Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015)
Google Scholar
Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6483–6492 (2019)
Google Scholar
Suin, M., Purohit, K., Rajagopalan, A.N.: Distillation-guided image inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2481–2490, October 2021
Google Scholar
Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools. 9, 23–34 (004). https://doi.org/10.1080/10867651.2004.10487596
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45
Chapter Google Scholar
Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. arXiv:abs/2002.03754 (2020)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report, CNS-TR-2011-001 (2011)
Google Scholar
Wan, Z., Zhang, J., Chen, D., Liao, J.: High-fidelity pluralistic image completion with transformers. In: ICCV (2021)
Google Scholar
Wang, Y., Tao, X., Qi, X., Shen, X., Jia, J.: Image inpainting via generative multi-column convolutional neural networks. In: NeurIPS (2018)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Xie, C., et al.: Image inpainting with learnable bidirectional attention maps. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8857–8866 (2019)
Google Scholar
Xing, X., Gao, R., Han, T., Zhu, S.C., Wu, Y.N.: Deformable generator network: Unsupervised disentanglement of appearance and geometry. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Xiong, W., et al.: Foreground-aware image inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5833–5841 (2019)
Google Scholar
Yan, Z., Li, X., Li, M., Zuo, W., Shan, S.: Shift-Net: image inpainting via deep feature rearrangement. arXiv:abs/1801.09392 (2018)
Yang, C., Shen, Y., Zhou, B.: Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vis. 129, 1451–1466 (2021)
Article Google Scholar
Yang, C., Lu, X., Lin, Z.L., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4076–4084 (2017)
Google Scholar
Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv:abs/1506.03365 (2015)
Yu, J., Lin, Z.L., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4470–4479 (2019)
Google Scholar
Yu, Y., et al.: Diverse image inpainting with bidirectional and autoregressive transformers. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)
Google Scholar
Zeng, Y., Fu, J., Chao, H., Guo, B.: Learning pyramid-context encoder network for high-quality image inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1486–1494 (2019)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhao, L., et al.: UCTGAN: diverse image inpainting based on unsupervised cross-space translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5740–5749 (2020)
Google Scholar
Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv:abs/2103.10428 (2021)
Zheng, C., Cham, T., Cai, J.: Pluralistic image completion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1438–1447 (2019)
Google Scholar
Zhou, X., Li, J., Wang, Z., He, R., Tan, T.: Image inpainting with contrastive relation network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4420–4427 (2021)
Google Scholar
Zhuang, C., Zhai, A., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6001–6011 (2019)
Google Scholar

Download references

Acknowledgement

This work was supported in part by Sony Focused Research Award, NSF CAREER IIS-2150012, Wisconsin Alumni Research Foundation, and NASA 80NSSC21K0295.

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison, USA
Yuheng Li & Yong Jae Lee
Adobe Research, Bangalore, India
Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman & Krishna Kumar Singh

Authors

Yuheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yijun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jingwan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Eli Shechtman
View author publications
You can also search for this author in PubMed Google Scholar
Yong Jae Lee
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuheng Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4736 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y.J., Singh, K.K. (2022). Contrastive Learning for Diverse Disentangled Foreground Generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-19787-1_19
Published: 21 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19786-4
Online ISBN: 978-3-031-19787-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contrastive Learning for Diverse Disentangled Foreground Generation