skip to main content
10.1145/3503161.3548415acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Customizing GAN Using Few-shot Sketches

Authors Info & Claims
Published:10 October 2022Publication History

ABSTRACT

Generative adversarial networks (GANs) have demonstrated remarkable success in image synthesis applications, but their performance deteriorates under limited data regimes. The fundamental challenge is that it is extremely difficult to synthesize photo-realistic and highly diversified images while capturing meaningful attributes of the targets under minimum supervision. Previous methods either fine-tune or rewrite the model weights to adapt to few-shot datasets. However, this either overfits or requires access to large-scale data on which they are trained. To tackle the problem, we propose a framework that repurposes the existing pre-trained generative models using only a few samples (e.g., <30) of sketches. Unlike previous works, we transfer the sample diversity and quality without accessing the source data using inter-domain distance consistency. By employing cross-domain adversarial learning, we encourage the model output to closely resemble the input sketches in both shape and pose. Extensive experiments show that our method significantly outperforms the existing approaches in terms of sample quality and diversity. The qualitative and quantitative results on various standard datasets also demonstrate its efficacy. On the most popularly used dataset, Gabled church, we achieve a Fréchet inception distance (FID) score of 15.63.

Skip Supplemental Material Section

Supplemental Material

MM22-fp3116.mp4

mp4

13 MB

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432--4441.Google ScholarGoogle ScholarCross RefCross Ref
  2. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  3. Tu Bui, Leo Ribeiro, Moacir Ponti, and John Collomosse. 2017. Compact descrip- tors for sketch-based image retrieval using a triplet loss convolutional neural network. Computer Vision and Image Understanding 164 (2017), 27--37.Google ScholarGoogle ScholarCross RefCross Ref
  4. Salman Cheema, Sumit Gulwani, and Joseph LaViola. 2012. QuickDraw: improving drawing experience for geometric diagrams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1037--1064.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5 (2009), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, 1597--1607.Google ScholarGoogle Scholar
  7. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789--8797.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  9. August DuMont Schütte, Jürgen Hetzel, Sergios Gatidis, Tobias Hepp, Benedikt Dietz, Stefan Bauer, and Patrick Schwab. 2021. Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation. NPJ Digital Medicine 4, 1 (2021), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  10. Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications 31, 6 (2011), 56--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta- learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google ScholarGoogle Scholar
  12. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  13. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30 (2017).Google ScholarGoogle Scholar
  14. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841--9850.Google ScholarGoogle Scholar
  15. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.Google ScholarGoogle Scholar
  17. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to- image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ali Jahanian, Xavier Puig, Yonglong Tian, and Phillip Isola. 2021. Generative Models as a Data Source for Multiview Representation Learning. arXiv preprint arXiv:2106.05258 (2021).Google ScholarGoogle Scholar
  19. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real- time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (2020), 12104--12114.Google ScholarGoogle Scholar
  21. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.Google ScholarGoogle ScholarCross RefCross Ref
  23. Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with in-vertible 1x1 convolutions. Advances in Neural Information Processing Systems 31 (2018).Google ScholarGoogle Scholar
  24. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunning- ham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681--4690.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1403--1412.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yijun Li, Richard Zhang, Jingwan Cynthia Lu, and Eli Shechtman. 2020. Few-shot Image Generation with Elastic Weight Consolidation. Advances in Neural Information Processing Systems 33 (2020), 15885--15896.Google ScholarGoogle Scholar
  27. Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, and Winston Hsu. 2013. 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3495--3502.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10551--10560.Google ScholarGoogle ScholarCross RefCross Ref
  29. Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794--2802.Google ScholarGoogle ScholarCross RefCross Ref
  30. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  31. Sangwoo Mo, Minsu Cho, and Jinwoo Shin. 2020. Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020).Google ScholarGoogle Scholar
  32. Atsuhiro Noguchi and Tatsuya Harada. 2019. Image generation from small datasets via batch statistics adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2750--2758.Google ScholarGoogle ScholarCross RefCross Ref
  33. Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.Google ScholarGoogle ScholarCross RefCross Ref
  34. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2337--2346.Google ScholarGoogle ScholarCross RefCross Ref
  35. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google ScholarGoogle Scholar
  36. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085--2094.Google ScholarGoogle ScholarCross RefCross Ref
  37. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representa- tion learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google ScholarGoogle Scholar
  38. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2287--2296.Google ScholarGoogle ScholarCross RefCross Ref
  39. Esther Robb, Wen-Sheng Chu, Abhishek Kumar, and Jia-Bin Huang. 2020. Few-shot adaptation of generative adversarial networks. arXiv preprint arXiv:2010.11943 (2020).Google ScholarGoogle Scholar
  40. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems 29 (2016).Google ScholarGoogle Scholar
  41. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4570--4580.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yujun Shen and Bolei Zhou. 2021. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1532--1540.Google ScholarGoogle ScholarCross RefCross Ref
  44. Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2018. Ingan: Capturing and remapping the" dna" of a natural image. arXiv preprint arXiv:1812.00231 (2018).Google ScholarGoogle Scholar
  45. Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, and Alexei A Efros. 2011. Data-driven visual similarity for cross-domain image matching. In Proceed- ings of the 2011 SIGGRAPH Asia Conference. 1--10.Google ScholarGoogle Scholar
  46. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Repre- sentations.Google ScholarGoogle Scholar
  47. Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.Google ScholarGoogle ScholarCross RefCross Ref
  48. Shuhan Tan, Yujun Shen, and Bolei Zhou. 2020. Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842 (2020).Google ScholarGoogle Scholar
  49. Aaron Van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv--1807.Google ScholarGoogle Scholar
  50. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al . 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).Google ScholarGoogle Scholar
  51. Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch your own gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14050--14060.Google ScholarGoogle ScholarCross RefCross Ref
  52. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020. Minegan: effective knowledge transfer from gans to target domains with few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9332--9341.Google ScholarGoogle Scholar
  53. Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez- Garcia, and Bogdan Raducanu. 2018. Transferring gans: generating images from limited data. In Proceedings of the European Conference on Computer Vision (ECCV). 218--234.Google ScholarGoogle Scholar
  54. Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google ScholarGoogle Scholar
  55. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5505--5514.Google ScholarGoogle ScholarCross RefCross Ref
  56. Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.Google ScholarGoogle ScholarCross RefCross Ref
  57. Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self- attention generative adversarial networks. In International Conference on Machine Learning. PMLR, 7354--7363.Google ScholarGoogle Scholar
  58. Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European Conference on Computer Vision. Springer, 649--666.Google ScholarGoogle ScholarCross RefCross Ref
  59. Miaoyun Zhao, Yulai Cong, and Lawrence Carin. 2020. On leveraging pretrained GANs for generation with limited data. In International Conference on Machine Learning. PMLR, 11340--11351.Google ScholarGoogle Scholar
  60. Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems 33 (2020), 7559--7570.Google ScholarGoogle Scholar
  61. Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592--608.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision. Springer, 597--613.Google ScholarGoogle ScholarCross RefCross Ref
  63. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 October 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate995of4,171submissions,24%

    Upcoming Conference

    MM '24
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader