research-article

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

Authors:
Sonam Gupta

Indian Institute of Technology, Madras, India

Indian Institute of Technology, Madras, India

0000-0003-0562-8887
View Profile

,
Arti Keshari

Indian Institute of Technology, Madras, India

Indian Institute of Technology, Madras, India

0000-0001-9097-8990
View Profile

,
Sukhendu Das

Indian Institute of Technology, Madras, India

Indian Institute of Technology, Madras, India

0000-0002-2823-9211
View Profile

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image ProcessingDecember 2022Article No.: 45Pages 1–9https://doi.org/10.1145/3571600.3571646

Published:12 May 2023Publication History

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

Pages 1–9

ABSTRACT

Implicit neural representation (INR) has emanated as a powerful paradigm for 2D image representation. Recent works like INR-GAN have successfully adopted INR for 2D image synthesis. However, these lack explicit control on the generated images as achieved by their 3D-aware image synthesis counterparts like GIRAFFE. Our work investigates INRs for the task of controllable image synthesis. We propose a novel framework that allows for manipulation of foreground, background and their shape and appearance in the latent space. To achieve effective control over these attributes, we introduce a novel feature mask coupling technique that leverages the foreground and background masks for mutual learning. Extensive quantitative and qualitative analysis shows that our model can disentangle the latent space successfully and allows to change the foreground and/or background’s shape and appearance. We further demonstrate that our network takes lesser training time than other INR-based image synthesis methods.

References

Jonas Adler and Sebastian Lunz. 2018. Banach wasserstein gan. Advances in Neural Information Processing Systems 31 (2018).Google Scholar
Ivan Anokhin, Kirill Demochkin, Taras Khakhulin, Gleb Sterkin, Victor Lempitsky, and Denis Korzhenkov. 2021. Image generators with conditionally-independent pixel synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14278–14287.Google ScholarCross Ref
Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013), 1798–1828.Google ScholarDigital Library
Rohan Chabra, Jan E Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, and Richard Newcombe. 2020. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In European Conference on Computer Vision. Springer, 608–625.Google ScholarDigital Library
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems 29 (2016).Google Scholar
Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8628–8638.Google ScholarCross Ref
Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939–5948.Google ScholarCross Ref
Julian Chibane, Gerard Pons-Moll, 2020. Neural unsigned distance fields for implicit function learning. Advances in Neural Information Processing Systems 33 (2020), 21638–21652.Google Scholar
Emily L Denton 2017. Unsupervised learning of disentangled representations from video. Advances in neural information processing systems 30 (2017).Google Scholar
Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, 2021. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822–19835.Google Scholar
Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, and Thomas Funkhouser. 2019. Learning shape templates with structured implicit functions. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7154–7164.Google ScholarCross Ref
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
Sonam Gupta, Arti Keshari, and Sukhendu Das. 2021. G3AN++ exploring wide GANs with complementary feature learning for video generation. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing. 1–9.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and S. Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS.Google Scholar
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. 2016. beta-vae: Learning basic visual concepts with a constrained variational framework. (2016).Google Scholar
Qiyang Hu, Attila Szabó, Tiziano Portenier, Paolo Favaro, and Matthias Zwicker. 2018. Disentangling factors of variation by mixing them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3399–3407.Google Scholar
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501–1510.Google ScholarCross Ref
Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, Thomas Funkhouser, 2020. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6001–6010.Google ScholarCross Ref
Zhiqiang Tao Jiang, Songyao and Yun Fu. 2019. Segmentation guided image-to-image translation with adversarial networks. In 14th IEEE International Conference on Automatic Face & Gesture Recognition.Google ScholarDigital Library
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.Google ScholarCross Ref
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.Google ScholarCross Ref
Arti Keshari, Sonam Gupta, and Sukhendu Das. 2021. V3GAN: Decomposing Background, Foreground and Motion for Video Generation. (2021).Google Scholar
Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157–174.Google ScholarDigital Library
Yuheng Li, Krishna Kumar Singh, Yang Xue, and Yong Jae Lee. 2021. Partgan: Weakly-supervised part decomposition for image generation and segmentation. In British Machine Vision Conference (BMVC).Google Scholar
Meichen Liu, Xin Yan, Chenhui Wang, and Kejun Wang. 2021. Segmentation mask-guided person image generation. Applied Intelligence 51, 2 (2021), 1161–1176.Google ScholarDigital Library
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481–3490.Google Scholar
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405–421.Google ScholarDigital Library
Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. 2020. Blockgan: Learning 3d object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems 33 (2020), 6767–6778.Google Scholar
Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.Google ScholarCross Ref
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642–2651.Google Scholar
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.Google ScholarCross Ref
William Peebles, John Peebles, Jun-Yan Zhu, Alexei Efros, and Antonio Torralba. 2020. The hessian penalty: A weak prior for unsupervised disentanglement. In European Conference on Computer Vision. Springer, 581–597.Google ScholarDigital Library
Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.Google Scholar
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2304–2314.Google ScholarCross Ref
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems 33 (2020), 20154–20166.Google Scholar
Krishna Kumar Singh, Utkarsh Ojha, and Yong Jae Lee. 2019. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6490–6499.Google ScholarCross Ref
Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit Neural Representations with Periodic Activation Functions. ArXiv abs/2006.09661(2020).Google Scholar
Ivan Skorokhodov, Savva Ignatyev, and Mohamed Elhoseiny. 2021. Adversarial generation of continuous images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10753–10764.Google ScholarCross Ref
T Tieleman and Hinton G Lecture. 2012. 5Grmsprop: Dividethe gradientbyarunningaverageofitsrecent magnitude. cOURsERA: neural networks for machine Learning 4, 2 (2012), 2631.Google Scholar
Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. Advances in Neural Information Processing Systems 29 (2016), 613–621.Google Scholar
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems. 802–810.Google Scholar
Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A High-Resolution 3D-aware Generative Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18440–18449.Google ScholarCross Ref
Jianwei Yang, Anitha Kannan, Dhruv Batra, and Devi Parikh. 2017. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560(2017).Google Scholar
Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2015. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3973–3981.Google ScholarCross Ref
Yasin Yaz, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar, 2018. The unusual effectiveness of averaging in GAN training. In International Conference on Learning Representations.Google Scholar
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365(2015).Google Scholar
Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5802–5810.Google ScholarCross Ref

Index Terms

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Unsupervised Single Image Dehazing via Disentangled Representation
ICVIP '19: Proceedings of the 3rd International Conference on Video and Image Processing

Image dehazing aims to recover the latent clear content from the corresponding degraded hazy image. In this paper, we propose an unsupervised method for single image dehazing based on disentangled representation. Our proposed method does not rely on the ...
Read More
Dynamic Neural Networks for Adaptive Implicit Image Compression
Pattern Recognition and Computer Vision
Abstract
Compression with Implicit Neural Presentations (COIN) is a neural network image compression method based on multilayer perceptron (MLP). COIN encodes an image with an MLP that maps pixel positions to RGB values matching, the weights of the MLP are ...
Read More
Mask Embedding for Realistic High-Resolution Medical Image Synthesis
Medical Image Computing and Computer Assisted Intervention – MICCAI 2019
Abstract
Generative Adversarial Networks (GANs) have found applications in natural image synthesis and begin to show promises generating synthetic medical images. In many cases, the ability to perform controlled image synthesis using masked priors such as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing
December 2022
506 pages
ISBN:9781450398220
DOI:10.1145/3571600
Editors:
Soma Biswas,
Shanmuganathan Raman,
Amit K Roy-Chowdhury
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 May 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Generative adversarial networks
controllable image generation
foreground-background disentanglement
implicit neural representation
unsupervised learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate95of286submissions,33%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 27
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised Single Image Dehazing via Disentangled Representation

Dynamic Neural Networks for Adaptive Implicit Image Compression

Mask Embedding for Realistic High-Resolution Medical Image Synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Controllable Image Synthesis via Feature Mask Coupling using Implicit Neural Representation✱

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised Single Image Dehazing via Disentangled Representation

Dynamic Neural Networks for Adaptive Implicit Image Compression

Mask Embedding for Realistic High-Resolution Medical Image Synthesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media