skip to main content
10.1145/3394171.3413760acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SpatialGAN: Progressive Image Generation Based on Spatial Recursive Adversarial Expansion

Published: 12 October 2020 Publication History

Abstract

The image generation model based on generative adversarial networks has recently received significant attention and can produce diverse, sharp, and realistic images. However, generating high-resolution images has long been a challenge. In this paper, we propose a progressive spatial recursive adversarial expansion model(called SpatialGAN) capable of producing high-quality samples of the natural image. Our approach uses a cascade of convolutional networks to progressively generate images in a part-to-whole fashion. At each level of spatial expansion, a separate image-to-image spatial adversarial expansion network (conditional GAN) is recursively trained based on context image generated by previous GAN or CGAN. Unlike other coarse-to-fine generative methods that constraint on generative process either by multi-scale resolution or by hierarchical feature, the SpatialGAN decomposes image space into multiple subspaces and gradually resolves uncertainties in the local-to-whole generative process. The SpatialGAN greatly stabilizes and speeds up the training, which allows us to produce images of high quality. Based on visual Inception Score and Fréchet Inception Distance, we demonstrate that the quality of images generated by SpatialGAN on several typical datasets is better than that of images generated by GANs without cascading and comparative with the state of art methods with cascading.

Supplementary Material

MP4 File (3394171.3413760.mp4)
In this paper, we propose a progressive spatial recursive adversarial expansion model(called SpatialGAN) capable of producing high-quality samples of the natural image. Our approach uses a cascade of convolutional networks to progressively generate images in a part-to-whole fashion. At each level of spatial expansion, a separate image-toimage spatial adversarial expansion network (conditional GAN) is recursively trained based on context image generated by previous GAN or CGAN. Unlike other coarse-to-fine generative methods that constraint on generative process either by multi-scale resolution or by hierarchical feature, the SpatialGAN decomposes image space into multiple subspaces and gradually resolves uncertainties in the local-to-whole generative process.

References

[1]
and Chintala Alec, Luke. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. (2015).
[2]
James McClelland Andrew Saxe. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. (2014).
[3]
Michael Arbel, Dougal J. Sutherland, and Arthur Gretton. 2018. On gradient regularizers for MMD GANs. (2018).
[4]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein GAN. (2017).
[5]
Florian Bordes, Sina Honari, and Pascal Vincent. 2017. Learning to Generate Samples from Noise through Infusion Training. (2017).
[6]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
[7]
Peter J. Burt and Edward H. Adelson. 1987. The Laplacian Pyramid as a Compact Image Code. Readings in Computer Vision, Vol. 31, 4 (1987), 671--679.
[8]
Yang Chao, Lu Xin, Lin Zhe, Eli Shechtman, Oliver Wang, and Li Hao. 2016. High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis. (2016).
[9]
Qifeng Chen and Vladlen Koltun. 2017. Photographic Image Synthesis with Cascaded Refinement Networks. (2017).
[10]
Zihang Dai, Amjad Almahairi, Philip Bachman, Eduard Hovy, and Aaron Courville. 2018. Calibrating Energy-based Generative Adversarial Networks. (2018).
[11]
Metz david Berthelot, Tom. 2017. BEGAN: Boundary equilibrium generative adversarial networks. (2017).
[12]
Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In International Conference on Neural Information Processing Systems.
[13]
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, and Aaron Courville. 2016a. Adversarially Learned Inference. (2016).
[14]
Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2016b. A Learned Representation For Artistic Style. (2016).
[15]
Ishan Durugkar, Ian Gemp, and Sridhar Mahadevan. 2016. Generative Multi-Adversarial Networks. (2016).
[16]
Ian J. Goodfellow, Jean Pougetabadie, Mehdi Mirza, Xu Bing, David Wardefarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. Advances in Neural Information Processing Systems, Vol. 3 (2014), 2672--2680.
[17]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved Training of Wasserstein GANs. (2017).
[18]
Zhang Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2018. Self-Attention Generative Adversarial Networks. (2018).
[19]
Zhang Han, Xu Tao, and Hongsheng Li. 2016. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. (2016).
[20]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Günter Klambauer, and Sepp Hochreiter. 2018. GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium. (2018).
[21]
Xun Huang, Ming Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. (2018).
[22]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion., Vol. 36, 4 (2017), 1--14.
[23]
Mohammad Maminur Islam, Mohammad Khan Al Farabi, and Deepak Venugopal. 2017. Adaptive blocked Gibbs sampling for inference in probabilistic graphical models. In International Joint Conference on Neural Networks.
[24]
Justin Johnson, Alexandre Alahi, and Li Feifei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In European Conference on Computer Vision.
[25]
Diederik P Kingma, Tim Salimans, and Max Welling. 2017. Improving Variational Inference with Inverse Autoregressive Flow. (2017).
[26]
Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. (2013).
[27]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2016. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. (2016).
[28]
Tero Karras Timo Aila Samuli Laine Jaakko Lehtinen. 2016. Context Encoders: Feature Learning by Inpainting. In IEEE Conference on iclr.
[29]
Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual Generative Adversarial Networks for Small Object Detection. (2017).
[30]
Jae Hyun Lim and Jong Chul Ye. 2017. Geometric GAN. (2017).
[31]
Ming Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. (2017).
[32]
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Wang Zhen, and Stephen Paul Smolley. 2016. Least Squares Generative Adversarial Networks. (2016).
[33]
Michael Mathieu, Camille Couprie, and Yann Lecun. 2015. Deep multi-scale video prediction beyond mean square error. (2015).
[34]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. (2018).
[35]
Takeru Miyato and Masanori Koyama. 2018. cGANs with Projection Discriminator. (2018).
[36]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent Models of Visual Attention., Vol. 3 (2014).
[37]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on International Conference on Machine Learning.
[38]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2016. Conditional Image Synthesis With Auxiliary Classifier GANs. (2016).
[39]
Aaron Van Den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016a. Pixel Recurrent Neural Networks. (2016).
[40]
Aaron Van Den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016b. Conditional Image Generation with PixelCNN Decoders. (2016).
[41]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION. In IEEE Conference on Computer Vision Pattern Recognition.
[42]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Chen Xi. 2016. Improved Techniques for Training GANs. (2016).
[43]
Che Tong, Yanran Li, Athul Paul Jacob, Yoshua Bengio, and Wenjie Li. 2016. Mode Regularized Generative Adversarial Networks. (2016).
[44]
Sergey Tulyakov, Ming Yu Liu, Xiaodong Yang, and Jan Kautz. 2017. MoCoGAN: Decomposing Motion and Content for Video Generation. (2017).
[45]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance Normalization: The Missing Ingredient for Fast Stylization. (2016).
[46]
Benigno Uria, Marc Alexandre Côté, Karol Gregor, Iain Murray, and Hugo Larochelle. 2016.Neural Autoregressive Distribution Estimation. Journal of Machine Learning Research, Vol. 17, 1 (2016), 7184--7220.
[47]
Deepak Venugopal and Vibhav Gogate. 2013. Dynamic Blocking and Collapsing for Gibbs Sampling. Computer Science (2013).
[48]
Tongzhou Wang, Wu Yi, David A. Moore, and Stuart J. Russell. 2017c. Neural Block Sampling. (2017).
[49]
Ting Chun Wang, Ming Yu Liu, Jun Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2017a. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. (2017).
[50]
Xin Wang, Geoffrey Oxholm, Da Zhang, and Yuan Fang Wang. 2017b. Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer. (2017).
[51]
etc Xun. 2017. Stacked Generative Adversarial Networks. In International Conference on International Conference on Machine Learning.
[52]
Zhaoyi Yan, Xiaoming Li, Li Mu, Wangmeng Zuo, and Shiguang Shan. 2018. Shift-Net: Image Inpainting via Deep Feature Rearrangement. (2018).
[53]
Jianwei Yang, Anitha Kannan, Dhruv Batra, and Devi Parikh. 2017. LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation. (2017).
[54]
Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawajohnson, and Minh N. Do. 2016. Semantic Image Inpainting with Deep Generative Models. (2016).
[55]
Jiahui Yu, Lin Zhe, Jimei Yang, Xiaohui Shen, Lu Xin, and Thomas S. Huang. 2018. Generative Image Inpainting with Contextual Attention. (2018).
[56]
Cao Yun, Zhiming Zhou, Weinan Zhang, and Yu Yong. 2017. Unsupervised Diverse Colorization via Generative Adversarial Networks. (2017).
[57]
Junbo Zhao, Michael Mathieu, and Yann Lecun. 2017. Energy-based Generative Adversarial Network. (2017).
[58]
Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision.
[59]
Fanti Zinan, Khetan. 2018. PacGAN: The power of two samples in generative adversarial networks. (2018).

Cited By

View all
  • (2024)Unsupervised content and style learning for multimodal cross-domain image translationScientific Reports10.1038/s41598-024-80508-w14:1Online publication date: 27-Nov-2024
  • (2023)Progressive Positive Association Framework for Image and Text RetrievalProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612507(4807-4815)Online publication date: 26-Oct-2023
  • (2023)Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611866(7981-7992)Online publication date: 26-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. generative adversarial network
  2. progressive image generation
  3. spatial recursive adversarial expansion

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Unsupervised content and style learning for multimodal cross-domain image translationScientific Reports10.1038/s41598-024-80508-w14:1Online publication date: 27-Nov-2024
  • (2023)Progressive Positive Association Framework for Image and Text RetrievalProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612507(4807-4815)Online publication date: 26-Oct-2023
  • (2023)Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable TransformerProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611866(7981-7992)Online publication date: 26-Oct-2023
  • (2023)Knowing What it is: Semantic-Enhanced Dual Attention TransformerIEEE Transactions on Multimedia10.1109/TMM.2022.316478725(3723-3736)Online publication date: 2023
  • (2021)Image Style Transfer Algorithm Based on Semantic SegmentationIEEE Access10.1109/ACCESS.2021.30549699(54518-54529)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media