research-article

Customizing GAN Using Few-shot Sketches

Authors:
Syed Muhammad Israr

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Feng Zhao

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 2229–2238https://doi.org/10.1145/3503161.3548415

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2229–2238

ABSTRACT

Generative adversarial networks (GANs) have demonstrated remarkable success in image synthesis applications, but their performance deteriorates under limited data regimes. The fundamental challenge is that it is extremely difficult to synthesize photo-realistic and highly diversified images while capturing meaningful attributes of the targets under minimum supervision. Previous methods either fine-tune or rewrite the model weights to adapt to few-shot datasets. However, this either overfits or requires access to large-scale data on which they are trained. To tackle the problem, we propose a framework that repurposes the existing pre-trained generative models using only a few samples (e.g., <30) of sketches. Unlike previous works, we transfer the sample diversity and quality without accessing the source data using inter-domain distance consistency. By employing cross-domain adversarial learning, we encourage the model output to closely resemble the input sketches in both shape and pose. Extensive experiments show that our method significantly outperforms the existing approaches in terms of sample quality and diversity. The qualitative and quantitative results on various standard datasets also demonstrate its efficacy. On the most popularly used dataset, Gabled church, we achieve a Fréchet inception distance (FID) score of 15.63.

Supplemental Material

MM22-fp3116.mp4

mp4

13 MB

Download

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432--4441.Google ScholarCross Ref
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.Google Scholar
Tu Bui, Leo Ribeiro, Moacir Ponti, and John Collomosse. 2017. Compact descrip- tors for sketch-based image retrieval using a triplet loss convolutional neural network. Computer Vision and Image Understanding 164 (2017), 27--37.Google ScholarCross Ref
Salman Cheema, Sumit Gulwani, and Joseph LaViola. 2012. QuickDraw: improving drawing experience for geometric diagrams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1037--1064.Google ScholarDigital Library
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5 (2009), 1--10.Google ScholarDigital Library
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, 1597--1607.Google Scholar
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789--8797.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248--255.Google ScholarCross Ref
August DuMont Schütte, Jürgen Hetzel, Sergios Gatidis, Tobias Hepp, Benedikt Dietz, Stefan Bauer, and Patrick Schwab. 2021. Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation. NPJ Digital Medicine 4, 1 (2021), 1--14.Google ScholarCross Ref
Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications 31, 6 (2011), 56--66.Google ScholarDigital Library
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta- learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30 (2017).Google Scholar
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841--9850.Google Scholar
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.Google ScholarCross Ref
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to- image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.Google ScholarCross Ref
Ali Jahanian, Xavier Puig, Yonglong Tian, and Phillip Isola. 2021. Generative Models as a Data Source for Multiview Representation Learning. arXiv preprint arXiv:2106.05258 (2021).Google Scholar
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real- time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.Google ScholarCross Ref
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (2020), 12104--12114.Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.Google ScholarCross Ref
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.Google ScholarCross Ref
Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with in-vertible 1x1 convolutions. Advances in Neural Information Processing Systems 31 (2018).Google Scholar
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunning- ham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681--4690.Google ScholarCross Ref
Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1403--1412.Google ScholarCross Ref
Yijun Li, Richard Zhang, Jingwan Cynthia Lu, and Eli Shechtman. 2020. Few-shot Image Generation with Elastic Weight Consolidation. Advances in Neural Information Processing Systems 33 (2020), 15885--15896.Google Scholar
Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, and Winston Hsu. 2013. 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3495--3502.Google ScholarDigital Library
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10551--10560.Google ScholarCross Ref
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794--2802.Google ScholarCross Ref
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations.Google Scholar
Sangwoo Mo, Minsu Cho, and Jinwoo Shin. 2020. Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020).Google Scholar
Atsuhiro Noguchi and Tatsuya Harada. 2019. Image generation from small datasets via batch statistics adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2750--2758.Google ScholarCross Ref
Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.Google ScholarCross Ref
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2337--2346.Google ScholarCross Ref
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google Scholar
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085--2094.Google ScholarCross Ref
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representa- tion learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google Scholar
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2287--2296.Google ScholarCross Ref
Esther Robb, Wen-Sheng Chu, Abhishek Kumar, and Jia-Bin Huang. 2020. Few-shot adaptation of generative adversarial networks. arXiv preprint arXiv:2010.11943 (2020).Google Scholar
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems 29 (2016).Google Scholar
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--12.Google ScholarDigital Library
Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4570--4580.Google ScholarCross Ref
Yujun Shen and Bolei Zhou. 2021. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1532--1540.Google ScholarCross Ref
Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2018. Ingan: Capturing and remapping the" dna" of a natural image. arXiv preprint arXiv:1812.00231 (2018).Google Scholar
Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, and Alexei A Efros. 2011. Data-driven visual similarity for cross-domain image matching. In Proceed- ings of the 2011 SIGGRAPH Asia Conference. 1--10.Google Scholar
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Repre- sentations.Google Scholar
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.Google ScholarCross Ref
Shuhan Tan, Yujun Shen, and Bolei Zhou. 2020. Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842 (2020).Google Scholar
Aaron Van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv--1807.Google Scholar
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al . 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).Google Scholar
Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch your own gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14050--14060.Google ScholarCross Ref
Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020. Minegan: effective knowledge transfer from gans to target domains with few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9332--9341.Google Scholar
Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez- Garcia, and Bogdan Raducanu. 2018. Transferring gans: generating images from limited data. In Proceedings of the European Conference on Computer Vision (ECCV). 218--234.Google Scholar
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5505--5514.Google ScholarCross Ref
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.Google ScholarCross Ref
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self- attention generative adversarial networks. In International Conference on Machine Learning. PMLR, 7354--7363.Google Scholar
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European Conference on Computer Vision. Springer, 649--666.Google ScholarCross Ref
Miaoyun Zhao, Yulai Cong, and Lawrence Carin. 2020. On leveraging pretrained GANs for generation with limited data. In International Conference on Machine Learning. PMLR, 11340--11351.Google Scholar
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems 33 (2020), 7559--7570.Google Scholar
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592--608.Google ScholarDigital Library
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision. Springer, 597--613.Google ScholarCross Ref
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232Google ScholarCross Ref

Recommendations

EAC-GAN: Semi-supervised Image Enhancement Technology to Improve CNN Classification Performance
Artificial Intelligence and Security
Abstract
Deep neural networks require a large amount of data for supervised training and learning, but it is often difficult to obtain a large amount of label data in practical applications. Since semi-supervised learning can reduce the dependence of deep ...
Read More
AutoInfo GAN: Toward a better image synthesis GAN framework for high-fidelity few-shot datasets via NAS and contrastive learning
Abstract Background:
Generative adversarial networks (GANs) are vital techniques for synthesizing high-fidelity images. Recent studies have applied them to generation tasks under small-data scenarios. Most studies do not directly ...
Read More
Paired-D++ GAN for image manipulation with text
Abstract
Image manipulation with text is to semantically modify the appearance of an object in a source image based on the given text describing the novel visual attributes while retaining other irrelevant information in the image, such as the background. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
domain adaptation
generative adversarial network
image synthesis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 243
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Customizing GAN Using Few-shot Sketches

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Recommendations

EAC-GAN: Semi-supervised Image Enhancement Technology to Improve CNN Classification Performance

AutoInfo GAN: Toward a better image synthesis GAN framework for high-fidelity few-shot datasets via NAS and contrastive learning

Paired-D++ GAN for image manipulation with text