skip to main content
research-article

SWAGAN: a style-based wavelet-driven generative model

Published:19 July 2021Publication History
Skip Abstract Section

Abstract

In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach, designed to directly tackle the spectral bias of neural networks, yields an improvement in the ability to generate medium and high frequency content, including structures which other networks fail to learn. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to more realistic high-frequency content, even when trained for fewer iterations. Furthermore, we verify that our model's latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved high-frequency performance in downstream tasks.

Skip Supplemental Material Section

Supplemental Material

3450626.3459836.mp4

Presentation

mp4

164.5 MB

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432--4441.Google ScholarGoogle ScholarCross RefCross Ref
  2. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems. 2172--2180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yuanqi Chen, Ge Li, Cece Jin, Shan Liu, and Thomas Li. 2020. SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains. arXiv preprint arXiv:2012.05535 (2020).Google ScholarGoogle Scholar
  4. Ingrid Daubechies. 1990. The wavelet transform, time-frequency localization and signal analysis. IEEE transactions on information theory 36, 5 (1990), 961--1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ingrid Daubechies. 1992. Ten lectures on wavelets. SIAM.Google ScholarGoogle Scholar
  6. Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751 (2015).Google ScholarGoogle Scholar
  7. Ricard Durall, Margret Keuper, and Janis Keuper. 2020. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7890--7899.Google ScholarGoogle ScholarCross RefCross Ref
  8. Tarik Dzanic, Karan Shah, and Freddie Witherden. 2019. Fourier Spectrum Discrepancies in Deep Network Generated Images. arXiv preprint arXiv:1911.06465 (2019).Google ScholarGoogle Scholar
  9. Xing Gao and Hongkai Xiong. 2016. A hybrid wavelet convolution network with sparse-coding for image super-resolution. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 1439--1443.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  11. Huaibo Huang, Ran He, Zhenan Sun, and Tieniu Tan. 2017. Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution. In Proceedings of the IEEE International Conference on Computer Vision. 1689--1697.Google ScholarGoogle ScholarCross RefCross Ref
  12. Huaibo Huang, Ran He, Zhenan Sun, and Tieniu Tan. 2019. Wavelet domain generative adversarial network for multi-scale face hallucination. International Journal of Computer Vision 127, 6-7 (2019), 763--784.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eunhee Kang, Junhong Min, and Jong Chul Ye. 2017. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Medical physics 44, 10 (2017), e360--e375.Google ScholarGoogle Scholar
  14. Animesh Karnewar and Oliver Wang. 2019. MSG-GAN: multi-scale gradient GAN for stable image synthesis. arXiv preprint arXiv:1903.06048 (2019).Google ScholarGoogle Scholar
  15. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google ScholarGoogle Scholar
  16. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.Google ScholarGoogle Scholar
  17. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4401--4410.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.Google ScholarGoogle ScholarCross RefCross Ref
  19. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.Google ScholarGoogle ScholarCross RefCross Ref
  20. Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, and Qi Tian. 2020. Wavelet-Based Dual-Branch Network for Image Demoiréing. arXiv preprint arXiv:2007.07173 (2020).Google ScholarGoogle Scholar
  21. Pengju Liu, Hongzhi Zhang, Wei Lian, and Wangmeng Zuo. 2019b. Multi-level wavelet convolutional neural networks. IEEE Access 7 (2019), 74973--74985.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yunfan Liu, Qi Li, and Zhenan Sun. 2019a. Attribute-aware face aging with wavelet-based generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11877--11886.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Large-scale celebfacesattributes (celeba) dataset. (2018).Google ScholarGoogle Scholar
  24. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. arXiv preprint arXiv:2003.08934 (2020).Google ScholarGoogle Scholar
  25. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  26. Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. 2019. Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7588--7597.Google ScholarGoogle Scholar
  27. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google ScholarGoogle Scholar
  28. Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301--5310.Google ScholarGoogle Scholar
  29. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).Google ScholarGoogle Scholar
  30. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.Google ScholarGoogle Scholar
  31. Gage Skidmore. 2016. Gal Gadot image by Gage Skidmore [CC BY-SA 3.0], via Wikimedia Commons - https://commons.wikimedia.org/w/index.php?curid=50402815. (2016).Google ScholarGoogle Scholar
  32. Matthew Tancik, Pratul P Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T Barron, and Ren Ng. 2020. Fourier features let networks learn high frequency functions in low dimensional domains. arXiv preprint arXiv:2006.10739 (2020).Google ScholarGoogle Scholar
  33. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. arXiv preprint arXiv:2102.02766 (2021).Google ScholarGoogle Scholar
  35. Jianyi Wang, Xin Deng, Mai Xu, Congyong Chen, and Yuhang Song. 2020. Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video. arXiv preprint arXiv:2008.00499 (2020).Google ScholarGoogle Scholar
  36. Travis Williams and Robert Li. 2018. Wavelet pooling for convolutional neural networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  37. Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, and Jung-Woo Ha. 2019. Photorealistic style transfer via wavelet transforms. In Proceedings of the IEEE International Conference on Computer Vision. 9036--9045.Google ScholarGoogle ScholarCross RefCross Ref
  38. Qi Zhang, Huafeng Wang, Tao Du, Sichen Yang, Yuehai Wang, Zhiqiang Xing, Wenle Bai, and Yang Yi. 2019b. Super-resolution reconstruction algorithms based on fusion of deep learning mechanism and wavelet. In Proceedings of the 2nd International Conference on Artificial Intelligence and Pattern Recognition. 102--107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Qi Zhang, Huafeng Wang, and Sichen Yang. 2019a. Image Super-Resolution Using a Wavelet-based Generative Adversarial Network. arXiv preprint arXiv:1907.10213 (2019).Google ScholarGoogle Scholar
  40. Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, and Chao Zhang. 2018. Joint sub-bands learning with clique structures for wavelet domain super-resolution. Advances in Neural Information Processing Systems 31 (2018), 165--175.Google ScholarGoogle Scholar

Index Terms

  1. SWAGAN: a style-based wavelet-driven generative model

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 40, Issue 4
        August 2021
        2170 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3450626
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2021
        Published in tog Volume 40, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader