skip to main content
10.1145/3571600.3571646acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

Published:12 May 2023Publication History

ABSTRACT

Implicit neural representation (INR) has emanated as a powerful paradigm for 2D image representation. Recent works like INR-GAN have successfully adopted INR for 2D image synthesis. However, these lack explicit control on the generated images as achieved by their 3D-aware image synthesis counterparts like GIRAFFE. Our work investigates INRs for the task of controllable image synthesis. We propose a novel framework that allows for manipulation of foreground, background and their shape and appearance in the latent space. To achieve effective control over these attributes, we introduce a novel feature mask coupling technique that leverages the foreground and background masks for mutual learning. Extensive quantitative and qualitative analysis shows that our model can disentangle the latent space successfully and allows to change the foreground and/or background’s shape and appearance. We further demonstrate that our network takes lesser training time than other INR-based image synthesis methods.

References

  1. Jonas Adler and Sebastian Lunz. 2018. Banach wasserstein gan. Advances in Neural Information Processing Systems 31 (2018).Google ScholarGoogle Scholar
  2. Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, and Denis Korzhenkov. 2021. Image generators with conditionally-independent pixel synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14278–14287.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 1798–1828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rohan Chabra, Jan E Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, and Richard Newcombe. 2020. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In European Conference on Computer Vision. Springer, 608–625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems 29 (2016).Google ScholarGoogle Scholar
  6. Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8628–8638.Google ScholarGoogle ScholarCross RefCross Ref
  7. Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.Google ScholarGoogle ScholarCross RefCross Ref
  8. Julian Chibane, Gerard Pons-Moll, 2020. Neural unsigned distance fields for implicit function learning. Advances in Neural Information Processing Systems 33 (2020), 21638–21652.Google ScholarGoogle Scholar
  9. Emily L Denton 2017. Unsupervised learning of disentangled representations from video. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  10. Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, 2021. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822–19835.Google ScholarGoogle Scholar
  11. Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, and Thomas Funkhouser. 2019. Learning shape templates with structured implicit functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7154–7164.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  13. Sonam Gupta, Arti Keshari, and Sukhendu Das. 2021. G3AN++ exploring wide GANs with complementary feature learning for video generation. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. 1–9.Google ScholarGoogle Scholar
  14. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  15. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and S. Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS.Google ScholarGoogle Scholar
  16. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. (2016).Google ScholarGoogle Scholar
  17. Qiyang Hu, Attila Szabó, Tiziano Portenier, Paolo Favaro, and Matthias Zwicker. 2018. Disentangling factors of variation by mixing them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3399–3407.Google ScholarGoogle Scholar
  18. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.Google ScholarGoogle ScholarCross RefCross Ref
  19. Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser, 2020. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6001–6010.Google ScholarGoogle ScholarCross RefCross Ref
  20. Zhiqiang Tao Jiang, Songyao and Yun Fu. 2019. Segmentation guided image-to-image translation with adversarial networks. In 14th IEEE International Conference on Automatic Face & Gesture Recognition.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.Google ScholarGoogle Scholar
  22. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.Google ScholarGoogle ScholarCross RefCross Ref
  24. Arti Keshari, Sonam Gupta, and Sukhendu Das. 2021. V3GAN: Decomposing Background, Foreground and Motion for Video Generation. (2021).Google ScholarGoogle Scholar
  25. Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157–174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yuheng Li, Krishna Kumar Singh, Yang Xue, and Yong Jae Lee. 2021. Partgan: Weakly-supervised part decomposition for image generation and segmentation. In British Machine Vision Conference (BMVC).Google ScholarGoogle Scholar
  27. Meichen Liu, Xin Yan, Chenhui Wang, and Kejun Wang. 2021. Segmentation mask-guided person image generation. Applied Intelligence 51, 2 (2021), 1161–1176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481–3490.Google ScholarGoogle Scholar
  29. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405–421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. 2020. Blockgan: Learning 3d object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems 33 (2020), 6767–6778.Google ScholarGoogle Scholar
  31. Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.Google ScholarGoogle ScholarCross RefCross Ref
  32. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642–2651.Google ScholarGoogle Scholar
  33. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.Google ScholarGoogle ScholarCross RefCross Ref
  34. William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, and Antonio Torralba. 2020. The hessian penalty: A weak prior for unsupervised disentanglement. In European Conference on Computer Vision. Springer, 581–597.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.Google ScholarGoogle Scholar
  36. Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2304–2314.Google ScholarGoogle ScholarCross RefCross Ref
  37. Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.Google ScholarGoogle Scholar
  38. Krishna Kumar Singh, Utkarsh Ojha, and Yong Jae Lee. 2019. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6490–6499.Google ScholarGoogle ScholarCross RefCross Ref
  39. Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit Neural Representations with Periodic Activation Functions. ArXiv abs/2006.09661(2020).Google ScholarGoogle Scholar
  40. Ivan Skorokhodov, Savva Ignatyev, and Mohamed Elhoseiny. 2021. Adversarial generation of continuous images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10753–10764.Google ScholarGoogle ScholarCross RefCross Ref
  41. T Tieleman and Hinton G Lecture. 2012. 5Grmsprop: Dividethe gradientbyarunningaverageofitsrecent magnitude. cOURsERA: neural networks for machine Learning 4, 2 (2012), 2631.Google ScholarGoogle Scholar
  42. Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. Advances in Neural Information Processing Systems 29 (2016), 613–621.Google ScholarGoogle Scholar
  43. SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems. 802–810.Google ScholarGoogle Scholar
  44. Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18440–18449.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jianwei Yang, Anitha Kannan, Dhruv Batra, and Devi Parikh. 2017. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560(2017).Google ScholarGoogle Scholar
  46. Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3973–3981.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yasin Yaz, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar, 2018. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  48. Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365(2015).Google ScholarGoogle Scholar
  49. Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5810.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing
      December 2022
      506 pages
      ISBN:9781450398220
      DOI:10.1145/3571600

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 May 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate95of286submissions,33%
    • Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format