Abstract
Sketches reflect the drawing style of individual artists; therefore, it is important to consider their unique styles when extracting sketches from color images for various applications. Unfortunately, most existing sketch extraction methods are designed to extract sketches of a single style. Although there have been some attempts to generate various style sketches, the methods generally suffer from two limitations: low quality results and difficulty in training the model due to the requirement of a paired dataset. In this paper, we propose a novel multi-modal sketch extraction method that can imitate the style of a given reference sketch with unpaired data training in a semi-supervised manner. Our method outperforms state-of-the-art sketch extraction methods and unpaired image translation methods in both quantitative and qualitative evaluations.
Supplemental Material
Available for Download
supplemental material
- Amirsaman Ashtari, Chang Wook Seo, Cholmin Kang, Sihun Cha, and Junyong Noh. 2022. Reference Based Sketch Extraction via Attention Mechanism. ACM Trans. Graph. 41, 6, Article 207 (nov 2022), 16 pages. Google ScholarDigital Library
- J Canny. 1986. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 6 (June 1986), 679--698. Google ScholarDigital Library
- Caroline Chan, Frédo Durand, and Phillip Isola. 2022. Learning to generate line drawings that convey geometry and semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7915--7925.Google ScholarCross Ref
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.Google Scholar
- Wengling Chen and James Hays. 2018. Sketchygan: Towards diverse and realistic sketch to image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9416--9425.Google ScholarCross Ref
- Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8188--8197.Google ScholarCross Ref
- Yuanzheng Ci, Xinzhu Ma, Zhihui Wang, Haojie Li, and Zhongxuan Luo. 2018. User-guided deep anime line art colorization with conditional adversarial networks. In Proceedings of the 26th ACM international conference on Multimedia. 1536--1544.Google ScholarDigital Library
- DanbooruCommunity. 2021. Danbooru2020: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset. https://www.gwern.net/Danbooru2020. https://www.gwern.net/Danbooru2020 Accessed: 2022/04/03.Google Scholar
- Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, and Changqing Zou. 2020. Sketchycoco: Image generation from freehand scene sketches. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5174--5183.Google ScholarCross Ref
- Mingming He, Dongdong Chen, Jing Liao, Pedro V Sander, and Lu Yuan. 2018. Deep exemplar-based colorization. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--16.Google ScholarDigital Library
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6629--6640.Google ScholarDigital Library
- Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.Google ScholarCross Ref
- Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 172--189.Google ScholarDigital Library
- P. Isola, J. Zhu, T. Zhou, and A. A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5967--5976.Google Scholar
- Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.Google ScholarCross Ref
- Gwanghyun Kim and Jong Chul Ye. 2021. Diffusionclip: Text-guided image manipulation using diffusion models. (2021).Google Scholar
- Hyunsu Kim, Ho Young Jhoo, Eunhyeok Park, and Sungjoo Yoo. 2019. Tag2pix: Line art colorization using text tag with secat and changing loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9056--9065.Google ScholarCross Ref
- Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In International conference on machine learning. PMLR, 1857--1865.Google Scholar
- Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (12 2014).Google Scholar
- Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, and Andrew Rabinovich. 2017. Roomnet: End-to-end room layout estimation. In Proceedings of the IEEE international conference on computer vision. 4865--4874.Google ScholarCross Ref
- Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020b. Drit++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision 128, 10 (2020), 2402--2417.Google ScholarDigital Library
- Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. 2020a. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5801--5810.Google ScholarCross Ref
- Boyi Li, Serge Belongie, Ser-nam Lim, and Abe Davis. 2022. Neural Image Recolorization for Creative Domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2226--2230.Google ScholarCross Ref
- Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip HS Torr. 2020a. Manigan: Text-guided image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7880--7889.Google ScholarCross Ref
- Bowen Li, Xiaojuan Qi, Philip Torr, and Thomas Lukasiewicz. 2020b. Lightweight generative adversarial networks for text-guided image manipulation. Advances in Neural Information Processing Systems 33 (2020), 22020--22031.Google Scholar
- Chengze Li, Xueting Liu, and Tien-Tsin Wong. 2017a. Deep extraction of manga structural lines. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1--12.Google ScholarDigital Library
- Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, and Wen Gao. 2020c. Direct speech-to-image translation. IEEE Journal of Selected Topics in Signal Processing 14, 3 (2020), 517--529.Google ScholarCross Ref
- Yuhang Li, Xuejin Chen, Feng Wu, and Zheng-Jun Zha. 2019. Linestofacephoto: Face photo generation from lines with conditional self-attention generative adversarial networks. In Proceedings of the 27th ACM International Conference on Multimedia. 2323--2331.Google ScholarDigital Library
- Yi Li, Yi-Zhe Song, Timothy M Hospedales, and Shaogang Gong. 2017b. Free-hand sketch synthesis with deformable stroke models. International Journal of Computer Vision 122, 1 (2017), 169--190.Google ScholarCross Ref
- Bingchen Liu, Kunpeng Song, Yizhe Zhu, and Ahmed Elgammal. 2020b. Sketch-to-art: Synthesizing stylized art images from sketches. In Proceedings of the Asian Conference on Computer Vision.Google Scholar
- Xueting Liu, Wenliang Wu, Chengze Li, Yifan Li, and Huisi Wu. 2022. Reference-guided structure-aware deep sketch colorization for cartoons. Computational Visual Media 8, 1 (2022), 135--148.Google ScholarCross Ref
- Xueting Liu, Wenliang Wu, Huisi Wu, and Zhenkun Wen. 2021. Deep Style Transfer for Line Drawings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 353--361.Google ScholarCross Ref
- Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, and Bruno Lepri. 2020a. Describe what to change: A text-guided unsupervised image-to-image translation approach. In Proceedings of the 28th ACM International Conference on Multimedia. 1357--1365.Google ScholarDigital Library
- lllyasviel. 2017. sketchKeras. https://github.com/lllyasviel/sketchKeras.Google Scholar
- Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129--137.Google ScholarDigital Library
- Cristina Luna-Jiménez, Ricardo Kleinlein, David Griol, Zoraida Callejas, Juan M Montero, and Fernando Fernández-Martínez. 2021. A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset. Applied Sciences 12, 1 (2021), 327.Google ScholarCross Ref
- Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. 2018a. Exemplar guided unsupervised image-to-image translation with semantic consistency. arXiv preprint arXiv:1805.11145 (2018).Google Scholar
- Liqian Ma, Xu Jia, Stamatios Georgoulis, Tinne Tuytelaars, and Luc Van Gool. 2018b. Exemplar guided unsupervised image-to-image translation with semantic consistency. arXiv preprint arXiv:1805.11145 (2018).Google Scholar
- Haoran Mo, Edgar Simo-Serra, Chengying Gao, Changqing Zou, and Ruomei Wang. 2021. General virtual sketching framework for vector line art. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--14.Google ScholarDigital Library
- Ori Nizan and Ayellet Tal. 2020. Breaking the cycle-colleagues are all you need. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7860--7869.Google ScholarCross Ref
- Evangelos Ntavelis, Andrés Romero, Iason Kastanis, Luc Van Gool, and Radu Timofte. 2020. Sesame: Semantic editing of scenes by adding, manipulating or erasing objects. In European Conference on Computer Vision. Springer, 394--411.Google ScholarDigital Library
- Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020a. Contrastive learning for unpaired image-to-image translation. In European conference on computer vision. Springer, 319--345.Google ScholarDigital Library
- Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, and Richard Zhang. 2020b. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems 33 (2020), 7198--7211.Google Scholar
- Yonggang Qi, Guoyao Su, Pinaki Nath Chowdhury, Mingkang Li, and Yi-Zhe Song. 2021. Sketchlattice: Latticed representation for sketch manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 953--961.Google ScholarCross Ref
- ref2sketch. 2022. Ref2sketch official page. https://github.com/ref2sketch/ref2sketch.Google Scholar
- Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, and Tomer Michaeli. 2021. Spatially-Adaptive Pixelwise Networks for Fast Image Translation. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Congcong Ruan, Dihu Chen, and Haifeng Hu. 2019. Multimodal supervised image translation. Electronics Letters 55, 4 (2019), 190--192.Google ScholarCross Ref
- Andrey V Savchenko. 2022. HSEmotion: High-speed emotion recognition library. Software Impacts (2022), 100433.Google Scholar
- David Schneider, Saquib Sarfraz, Alina Roitberg, and Rainer Stiefelhagen. 2022. Pose-based contrastive learning for domain agnostic activity representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3433--3443.Google ScholarCross Ref
- Kwanggyoon Seo, Seoung Wug Oh, Jingwan Lu, Joon-Young Lee, Seonghyeon Kim, and Junyong Noh. 2022. StylePortraitVideo: Editing Portrait Videos with Expression Optimization. (2022).Google Scholar
- Chenyang Si, Wei Wang, Liang Wang, and Tieniu Tan. 2018. Multistage adversarial losses for pose-based human image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 118--126.Google ScholarCross Ref
- Aliaksandr Siarohin, Enver Sangineto, Stéphane Lathuiliere, and Nicu Sebe. 2018. Deformable gans for pose-based human image generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3408--3416.Google ScholarCross Ref
- Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. 2018. Mastering Sketching: Adversarial Augmentation for Structured Prediction. ACM Trans. Graph. 37, 1, Article 11 (jan 2018), 13 pages. Google ScholarDigital Library
- Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, and Hiroshi Ishikawa. 2016. Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--11.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, and Anna Khoreva. 2020. You only need adversarial supervision for semantic image synthesis. arXiv preprint arXiv:2012.04781 (2020).Google Scholar
- Hao Tang, Philip HS Torr, and Nicu Sebe. 2022. Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2022).Google Scholar
- Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, and Yan Yan. 2019. MultiChannel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation. In CVPR.Google Scholar
- Harrish Thasarathan and Mehran Ebrahimi. 2019. Artist-guided semiautomatic animation colorization. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.Google ScholarCross Ref
- Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--11.Google ScholarDigital Library
- Chao Wang, Haiyong Zheng, Zhibin Yu, Ziqiang Zheng, Zhaorui Gu, and Bing Zheng. 2018b. Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 770--785.Google ScholarDigital Library
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612. Google ScholarDigital Library
- Holger Winnemöller. 2011. XDoG: Advanced Image Stylization with EXtended Difference-of-Gaussians. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering (Vancouver, British Columbia, Canada) (NPAR '11). Association for Computing Machinery, New York, NY, USA, 147--156. Google ScholarDigital Library
- Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV).Google ScholarDigital Library
- Xiao Yang Yiheng Zhu Xiaohui Shen Xiaoyu Xiang, Ding Liu. 2021. Anime2Sketch: A Sketch Extractor for Anime Arts with Deep Networks. https://github.com/Mukosame/Anime2Sketch.Google Scholar
- Shaoan Xie, Mingming Gong, Yanwu Xu, and Kun Zhang. 2021. Unaligned image-to-image translation by learning to reweight. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14174--14184.Google ScholarCross Ref
- Saining Xie and Zhuowen Tu. 2015. Holistically-Nested Edge Detection. In 2015 IEEE International Conference on Computer Vision (ICCV). 1395--1403. Google ScholarDigital Library
- Jiu Xu, Björn Stenger, Tommi Kerola, and Tony Tung. 2017. Pano2cad: Room layout from a single panorama image. In 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, 354--362.Google Scholar
- Xuemiao Xu, Minshan Xie, Peiqi Miao, Wei Qu, Wenpeng Xiao, Huaidong Zhang, Xueting Liu, and Tien-Tsin Wong. 2021. Perceptual-Aware Sketch Simplification Based on Integrated VGG Layers. IEEE Transactions on Visualization and Computer Graphics 27, 1 (2021), 178--189. Google ScholarDigital Library
- Chuan Yan, David Vanderhaeghe, and Yotam Gingold. 2020. A Benchmark for Rough Sketch Cleanup. ACM Transactions on Graphics (TOG) 39, 6, Article 163 (Nov. 2020), 14 pages. Google ScholarDigital Library
- Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. 2019. Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.Google ScholarCross Ref
- Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. 2020. Unpaired portrait drawing generation via asymmetric cycle mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8217--8225.Google ScholarCross Ref
- Mingcheng Yuan and Edgar Simo-Serra. 2021. Line Art Colorization With Concatenated Spatial Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 3946--3950.Google ScholarCross Ref
- Lvmin Zhang, Chengze Li, Tien-Tsin Wong, Yi Ji, and Chunping Liu. 2018b. Two-stage sketch colorization. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--14.Google ScholarDigital Library
- Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018a. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 586--595. Google ScholarCross Ref
- Wenzhao Zheng, Jiwen Lu, and Jie Zhou. 2020. Structural deep metric learning for room layout estimation. In European Conference on Computer Vision. Springer, 735--751.Google ScholarDigital Library
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on.Google ScholarCross Ref
- Changqing Zou, Haoran Mo, Chengying Gao, Ruofei Du, and Hongbo Fu. 2019. Language-based colorization of scene sketches. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--16.Google ScholarDigital Library
Index Terms
- Semi-supervised reference-based sketch extraction using a contrastive learning framework
Recommendations
Reference Based Sketch Extraction via Attention Mechanism
We propose a model that extracts a sketch from a colorized image in such a way that the extracted sketch has a line style similar to a given reference sketch while preserving the visual content identically to the colorized image. Authentic sketches ...
Unpaired Sketch-to-Line Translation via Synthesis of Sketches
SA '19: SIGGRAPH Asia 2019 Technical BriefsConverting hand-drawn sketches into clean line drawings is a crucial step for diverse artistic works such as comics and product designs. Recent data-driven methods using deep learning have shown their great abilities to automatically simplify sketches on ...
Boosting semi-supervised learning with Contrastive Complementary Labeling
AbstractSemi-supervised learning (SSL) approaches have achieved great success in leveraging a large amount of unlabeled data to learn deep models. Among them, one popular approach is pseudo-labeling which generates pseudo labels only for those unlabeled ...
Comments