research-article

CLIPasso: semantically-aware object sketching

Authors:
Yael Vinker

Tel Aviv University, Israel

Tel Aviv University, Israel
View Profile

,
Ehsan Pajouheshgar

Swiss Federal Institute of Technology (EPFL), Switzerland

Swiss Federal Institute of Technology (EPFL), Switzerland
View Profile

,
Jessica Y. Bo

Swiss Federal Institute of Technology (EPFL), Switzerland

Swiss Federal Institute of Technology (EPFL), Switzerland
View Profile

,
Roman Christian Bachmann

Swiss Federal Institute of Technology (EPFL), Switzerland

Swiss Federal Institute of Technology (EPFL), Switzerland
View Profile

,
Amit Haim Bermano

Tel Aviv University, Israel

Tel Aviv University, Israel
View Profile

,
Daniel Cohen-Or

Tel Aviv University, Israel

Tel Aviv University, Israel
View Profile

,
Amir Zamir

Swiss Federal Institute of Technology (EPFL), Switzerland

Swiss Federal Institute of Technology (EPFL), Switzerland
View Profile

,
Ariel Shamir

Reichman University, Israel

Reichman University, Israel
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 41 Issue 4Article No.: 86pp 1–11https://doi.org/10.1145/3528223.3530068

Published:22 July 2022Publication History

ACM Transactions on Graphics

Abstract

Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present CLIPasso, an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distill semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.

Supplemental Material

3528223.3530068.mp4

presentation

mp4

155.8 MB

Download

Available for Download

zip

supplemental material (5.7 MB)

zip

Code for the paper "CLIPasso: semantically-aware object sketching" presented in SIGGRAPH 2022 and published in ACM Transactions on Graphics (TOG). The code is also available via GitHub: http://www.replicabilitystamp.org#https-github-com-yael-vinker-clipasso (7.2 MB)

vtt

3528223.3530068.vtt (14.9 KB)

References

Pablo Arbeláez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour Detection and Hierarchical Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 898--916. Google ScholarDigital Library
Pierre Bénard and Aaron Hertzmann. 2019. Line Drawings from 3D Models. Found. Trends Comput. Graph. Vis. 11 (2019), 1--159.Google ScholarDigital Library
Itamar Berger, Ariel Shamir, Moshe Mahler, Elizabeth Carter, and Jessica Hodgins. 2013. Style and Abstraction in Portrait Sketching. ACM Trans. Graph. 32, 4, Article 55 (jul 2013), 12 pages. Google ScholarDigital Library
Ayan Kumar Bhunia, Ayan Das, Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yulia Gryaditskaya, and Yi-Zhe Song. 2020. Pixelor: a competitive sketching AI agent. so you think you can sketch? ACM Trans. Graph. 39 (2020), 166:1--166:15.Google ScholarDigital Library
John Canny. 1986. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence 6 (1986), 679--698.Google ScholarDigital Library
Rebecca Chamberlain and Johan Wagemans. 2016. The genesis of errors in drawing. Neuroscience & Biobehavioral Reviews 65 (2016), 195--207.Google ScholarCross Ref
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 387--396.Google Scholar
Hong Chen, Ying-Qing Xu, Harry Shum, Song-Chun Zhu, and Nanning Zheng. 2001. Example-based facial sketch generation with non-parametric sampling. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 2 (2001), 433--438 vol.2.Google ScholarCross Ref
Yajing Chen, Shikui Tu, Yuqi Yi, and Lei Xu. 2017. Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories. ArXiv abs/1709.04121 (2017).Google Scholar
Judith E. Fan, Daniel L. K. Yamins, and Nicholas B. Turk-Browne. 2018. Common Object Representations for Visual Production and Recognition. Cognitive science 42 8 (2018), 2670--2698.Google Scholar
Judith W. Fan, Robert D. Hawkins, Mike Wu, and Noah D. Goodman. 2019. Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication. Computational Brain & Behavior 3 (2019), 86--101.Google ScholarCross Ref
Kevin Frans, Lisa B. Soros, and Olaf Witkowski. 2021. CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders. CoRR abs/2106.14843 (2021). arXiv:2106.14843 https://arxiv.org/abs/2106.14843Google Scholar
Yaroslav Ganin, Tejas D. Kulkarni, Igor Babuschkin, S. M. Ali Eslami, and Oriol Vinyals. 2018. Synthesizing Programs for Images using Reinforced Adversarial Learning. ArXiv abs/1804.01118 (2018).Google Scholar
Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, and Changqing Zou. 2020. SketchyCOCO: Image Generation From Freehand Scene Sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5174--5183.Google ScholarCross Ref
Gabriel Goh, Nick Cammarata &dagger;, Chelsea Voss &dagger;, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, and Chris Olah. 2021. Multimodal Neurons in Artificial Neural Networks. Distill (2021). https://distill.pub/2021/multimodal-neurons. Google ScholarCross Ref
Yulia Gryaditskaya, Mark Sypesteyn, Jan Willem Hoftijzer, Sylvia C. Pont, Frédo Durand, and Adrien Bousseau. 2019. OpenSketch: a richly-annotated dataset of product design sketches. ACM Trans. Graph. 38 (2019), 232:1--232:16.Google ScholarDigital Library
David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. CoRR abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477Google Scholar
A. Hertzmann. 2003. A survey of stroke-based rendering. IEEE Computer Graphics and Applications 23, 4 (2003), 70--81. Google ScholarDigital Library
Aaron Hertzmann. 2020. Why Do Line Drawings Work? A Realism Hypothesis. Perception 49 (2020), 439--451.Google Scholar
Aaron Hertzmann. 2021. The Role of Edges in Line Drawing Perception. Perception 50 (2021), 266--275.Google ScholarCross Ref
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR.Google Scholar
Moritz Kampelmühler and Axel Pinz. 2020. Synthesizing human-like sketches from natural images using a conditional convolutional decoder. CoRR abs/2003.07101 (2020). arXiv:2003.07101 https://arxiv.org/abs/2003.07101Google Scholar
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google Scholar
Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weissenborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Sylvain Gelly, Thomas Unterthiner, and Xiaohua Zhai. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR '21.Google Scholar
Yann LeCun and Corinna Cortes. 2005. The mnist database of handwritten digits.Google Scholar
Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-Sketching: Inferring Contour Drawings from Images. arXiv:1901.00542 [cs.CV]Google Scholar
Tzu-Mao Li, Michal Lukáč, Gharbi Michaël, and Jonathan Ragan-Kelley. 2020. Differentiable Vector Graphics Rasterization for Editing and Learning. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 39, 6 (2020), 193:1--193:15.Google Scholar
Yi Li, Yi-Zhe Song, Timothy M. Hospedales, and Shaogang Gong. 2015. Free-hand Sketch Synthesis with Deformable Stroke Models. CoRR abs/1510.02644 (2015). arXiv:1510.02644 http://arxiv.org/abs/1510.02644Google Scholar
Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, and X. Xue. 2020. Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 6757--6766.Google Scholar
John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, and S. M. Ali Eslami. 2019. Unsupervised Doodling and Painting with Improved SPIRAL. CoRR abs/1910.01007 (2019). arXiv:1910.01007 http://arxiv.org/abs/1910.01007Google Scholar
Daniela Mihai and Jonathon S. Hare. 2021a. Differentiable Drawing and Sketching. ArXiv abs/2103.16194 (2021).Google Scholar
Daniela Mihai and Jonathon S. Hare. 2021b. Learning to Draw: Emergent Communication through Sketching. ArXiv abs/2106.02067 (2021).Google Scholar
Meredith Minear and Denise C. Park. 2004. A lifespan database of adult facial stimuli. Behavior Research Methods, Instruments, & Computers 36 (2004), 630--633.Google ScholarCross Ref
Umar Riaz Muhammad, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2018. Learning Deep Sketch Abstraction. CoRR abs/1804.04804 (2018). arXiv:1804.04804 http://arxiv.org/abs/1804.04804Google Scholar
Yonggang Qi, Guoyao Su, Pinaki Nath Chowdhury, Mingkang Li, and Yi-Zhe Song. 2021. SketchLattice: Latticed Representation for Sketch Manipulation. ArXiv abs/2108.11636 (2021).Google Scholar
Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar Zaiane, and Martin Jagersand. 2020. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection. Pattern Recognition 106, 107404.Google ScholarCross Ref
Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Song-Chun Zhu, and Yixin Zhu. 2021. Emergent Graphical Conventions in a Visual Communication Game.Google Scholar
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020 https://arxiv.org/abs/2103.00020Google Scholar
Leo Sampaio Ferraz Ribeiro, Tu Bui, John P. Collomosse, and Moacir Antonelli Ponti. 2020. Sketchformer: Transformer-Based Representation for Sketched Structure. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 14141--14150.Google Scholar
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. ACM Trans. Graph. 35, 4, Article 119 (jul 2016), 12 pages. Google ScholarDigital Library
Jifei Song, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, and Timothy Hospedales. 2018. Learning to Sketch with Shortcut Cycle Consistency. arXiv:1805.00247 [cs.CV]Google Scholar
Yingtao Tian and David Ha. 2021. Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts. CoRR abs/2109.08857 (2021). arXiv:2109.08857 https://arxiv.org/abs/2109.08857Google Scholar
Barbara Tversky. 2002. What do Sketches Say about Thinking.Google Scholar
V Varshaneya, Sangeetha Balasubramanian, and Vineeth N. Balasubramanian. 2021. Teaching GANs to sketch in vector format. Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing (2021).Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2017. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. CoRR abs/1711.11585 (2017). arXiv:1711.11585 http://arxiv.org/abs/1711.11585Google Scholar
Holger Winnemöller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 36 (2012), 740--753.Google ScholarDigital Library
Peng Xu, Timothy M. Hospedales, Qiyue Yin, Yi-Zhe Song, Tao Xiang, and Liang Wang. 2020. Deep Learning for Free-Hand Sketch: A Survey and A Toolbox. arXiv:2001.02600 [cs.CV]Google Scholar
Justin Yang and Judith E. Fan. 2021. Visual communication of object concepts at different levels of abstraction. ArXiv abs/2106.02775 (2021).Google Scholar
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 586--595.Google ScholarCross Ref
N. Zheng, Yf Jiang, and Ding jiang Huang. 2019. StrokeNet: A Neural Painting Environment. In ICLR.Google Scholar
Tao Zhou, Chen Fang, Zhaowen Wang, Jimei Yang, Byungmoon Kim, Zhili Chen, Jonathan Brandt, and Demetri Terzopoulos. 2018. Learning to Sketch with Deep Q Networks and Demonstrated Strokes. ArXiv abs/1810.05977 (2018).Google Scholar

Index Terms

CLIPasso: semantically-aware object sketching
1. Computing methodologies
  1. Computer graphics

Recommendations

KidPen: a stroke-based method for kid-style sketches synthesis from photos
SA '17: SIGGRAPH Asia 2017 Technical Briefs

Drawings of children usually have a unique charm due to their naïve and untutored styles. To easily produce the kid-style art, we proposed KidPen, a method that can transform realistic photos into kid-style sketches. Synthesizing kid-style sketches is ...
Read More
Opacity light fields: interactive rendering of surface light fields with view-dependent opacity
I3D '03: Proceedings of the 2003 symposium on Interactive 3D graphics

We present new hardware-accelerated techniques for rendering surface light fields with opacity hulls that allow for interactive visualization of objects that have complex reflectance properties and elaborate geometrical details. The opacity hull is a ...
Read More
A fast relighting engine for interactive cinematic lighting design
SIGGRAPH '00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques

We present new techniques for interactive cinematic lighting design of complex scenes that use procedural shaders. Deep-framebuffers are used to store the geometric and optical information of the visible surfaces of an image. The geometric information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 41, Issue 4
July 2022
1978 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3528223
Issue’s Table of Contents

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2022
Published in tog Volume 41, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image-based rendering
sketch synthesis
vector line art generation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 989
  Total Downloads
- Downloads (Last 12 months)588
- Downloads (Last 6 weeks)77
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CLIPasso: semantically-aware object sketching

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

KidPen: a stroke-based method for kid-style sketches synthesis from photos

Opacity light fields: interactive rendering of surface light fields with view-dependent opacity

A fast relighting engine for interactive cinematic lighting design

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CLIPasso: semantically-aware object sketching

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

KidPen: a stroke-based method for kid-style sketches synthesis from photos

Opacity light fields: interactive rendering of surface light fields with view-dependent opacity

A fast relighting engine for interactive cinematic lighting design

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media