research-article

Cloud2Sketch: Augmenting Clouds with Imaginary Sketches

Authors:

Zhangyang Wang,

Jiebo LuoAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2441 - 2451

https://doi.org/10.1145/3503161.3547810

Published: 10 October 2022 Publication History

Abstract

Have you ever looked up at the sky and imagined what the clouds look like? In this work, we present an interesting task that augments clouds in the sky with imagined sketches. Different from generic image-to-sketch translation tasks, unique challenges are introduced: real-world clouds have different levels of similarity to something; sketch generation without sketch retrieval could lead to something unrecognizable; a retrieved sketch from some dataset cannot be directly used because of the mismatch of the shape; an optimal sketch imagination is subjective. We propose Cloud2Sketch, a novel self-supervised pipeline to tackle the aforementioned challenges. First, we pre-process cloud images with a cloud detector and a thresholding algorithm to obtain cloud contours. Then, cloud contours are passed through a retrieval module to retrieve sketches with similar geometrical shapes. Finally, we adopt a novel sketch translation model with built-in free-form deformation for aligning the sketches to cloud contours. To facilitate training, an icon-based sketch collection named Sketchy Zoo is proposed. Extensive experiments validate the effectiveness of our method both qualitatively and quantitatively.

Supplementary Material

M4V File (MM22-fp297.m4v)

Presentation video

Download
27.03 MB

References

[1]

EC Barrett and Colin K Grant. 1976. The identification of cloud types in LANDSAT MSS images. Technical Report.

[2]

Fred L. Bookstein. 1989. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on pattern analysis and machine intelligence (TPAMI) 11, 6 (1989), 567--585.

Digital Library

[3]

Randy L Buckner, Jessica R Andrews-Hanna, and Daniel L Schacter. 2008. The brain's default network: anatomy, function, and relevance to disease. Annals of the new York Academy of Sciences 1124, 1 (2008), 1--38.

[4]

Caroline Chan, Fredo Durand, and Phillip Isola. 2022. Learning to generate line drawings that convey geometry and semantics. arXiv preprint arXiv:2203.12691 (2022).

[5]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[6]

Herbert James Clark. 1965. Recognition memory for random shapes as a function of complexity, association value, and delay. Journal of Experimental Psychology 69, 6 (1965), 590.

[7]

Soumyabrata Dev, Florian M Savoy, Yee Hui Lee, and Stefan Winkler. 2017. Nighttime sky/cloud image segmentation. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 345--349.

Digital Library

[8]

Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Transactions on graphics (TOG) 31, 4 (2012), 1--10.

[9]

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. 2019. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations (ICLR).

[10]

Aaron Gross and Anne Vallely. 2012. Animals and the human imagination: a companion to animal studies. Columbia University Press.

[11]

David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017).

[12]

Jessica B Hamrick. 2019. Analogues of mental simulation and imagination in deep learning. Current Opinion in Behavioral Sciences 29 (2019), 8--16.

[13]

Rana Hanocka, Noa Fish, Zhenhua Wang, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. 2018. Alignet: Partial-shape agnostic alignment via unsupervised learning. ACM Transactions on Graphics (TOG) 38, 1 (2018), 1--14.

[14]

Matthias Harders and Gabor Szekely. 2003. Enhancing human-computer interaction in medical segmentation. Proc. IEEE 91, 9 (2003), 1430--1442.

[15]

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245--258.

[16]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2021. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377 (2021).

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 770--778.

[18]

Xiaodi Hou, Alan Yuille, and Christof Koch. 2013. Boundary detection benchmarking: Beyond f-measures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2123--2130.

Digital Library

[19]

Ming-Kuei Hu. 1962. Visual pattern recognition by moment invariants. IRE transactions on information theory 8, 2 (1962), 179--187.

[20]

World international organization. 1987. International Cloud Atlas Vol 2. World Meteorological Organization.

[21]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-toimage translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1125--1134.

[22]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. Advances in neural information processing systems 28 (2015).

[23]

Mehmet Kesim and Yasin Ozarslan. 2012. Augmented reality in education: current technologies and the potential for education. Procedia-social and behavioral sciences 47 (2012), 297--302.

[24]

Michael R LaChat. 1986. Artificial intelligence and ethics: an exercise in the moral imagination. Ai Magazine 7, 2 (1986), 70--70.

Digital Library

[25]

Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1403--1412.

[26]

Qingyong Li, Weitao Lu, and Jun Yang. 2011. A hybrid thresholding algorithm for cloud detection on ground-based color images. Journal of atmospheric and oceanic technology 28, 10 (2011), 1286--1296.

[27]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.

[28]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision (ECCV). Springer, 740--755.

[29]

Fang Liu, Xiaoming Deng, Yu-Kun Lai, Yong-Jin Liu, Cuixia Ma, and Hongan Wang. 2019. Sketchgan: Joint sketch completion and recognition with generative adversarial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5830--5839.

[30]

Charles N Long, Jeff M Sabburg, Josep Calbó, and David Pagès. 2006. Retrieving cloud characteristics from ground-based daytime color all-sky images. Journal of Atmospheric and Oceanic Technology 23, 5 (2006), 633--652.

[31]

Sridhar Mahadevan. 2018. Imagination machines: A new challenge for artificial intelligence. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 32.

[32]

Zahid Mahmood, Tauseef Ali, Nazeer Muhammad, Nargis Bibi, Imran Shahzad, and Shoaib Azmat. 2017. EAR: Enhanced augmented reality system for sports entertainment applications. KSII Transactions on Internet and Information Systems (TIIS) 11, 12 (2017), 6069--6091.

[33]

D. Martin, C. Fowlkes, D. Tal, and J. Malik. 2001. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In Proc. 8th Int'l Conf. Computer Vision, Vol. 2. 416--423.

[34]

Florian A Potra, Xing Liu, Francoise Seillier-Moiseiwitsçh, Anindya Roy, Yaming Hang, Mark R Marten, Babu Raman, and Carol Whisnant. 2006. Protein image alignment via piecewise affine transformations. Journal of Computational Biology 13, 3 (2006), 614--630.

[35]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML). PMLR, 8748--8763.

[36]

Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--12.

Digital Library

[37]

Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, and Ran He. 2021. Everything's Talkin': Pareidolia Face Reenactment. arXiv preprint arXiv:2104.03061 (2021).

[38]

Qianqian Song, Zhihui Cui, and Pu Liu. 2020. An Efficient Solution for Semantic Segmentation of Three Ground-based Cloud Datasets. Earth and Space Science 7, 4 (2020), e2019EA001040.

[39]

X. Soria, E. Riba, and A. Sappa. 2020. Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE Computer Society, Los Alamitos, CA, USA, 1912--1921.

[40]

Lidan Wang, Vishwanath Sindagi, and Vishal Patel. 2018. High-quality facial photo-sketch synthesis using multi-adversarial networks. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG). IEEE, 83--90.

Digital Library

[41]

Xiaogang Wang and Xiaoou Tang. 2008. Face photo-sketch synthesis and recognition. IEEE transactions on pattern analysis and machine intelligence (TPAMI) 31, 11 (2008), 1955--1967.

[42]

Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 7278--7286.

[43]

Peng Xu, Yongye Huang, Tongtong Yuan, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, Zhanyu Ma, and Jun Guo. 2018. Sketchmate: Deep hashing for million-scale human sketch retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 8090--8098.

[44]

Meijuan Ye, Shizhe Zhou, and Hongbo Fu. 2019. DeepShapeSketch: Generating hand drawing sketches from 3D objects. In 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.

[45]

Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. Sketchnet: Sketch classification with web images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1105--1113.

[46]

Hua Zhang, Peng She, Yong Liu, Jianhou Gan, Xiaochun Cao, and Hassan Foroosh. 2019. Learning structural representations via dynamic object landmarks discovery for sketch recognition and retrieval. IEEE Transactions on Image Processing 28, 9 (2019), 4486--4499.

[47]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (ICCV). 2223--2232.

Index Terms

Cloud2Sketch: Augmenting Clouds with Imaginary Sketches
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics

Recommendations

Architectural Drawing Using Pencil Sketches and AutoCAD
Utilizing shape retrieval in sketch synthesis

This article describes SR-Sketch, a sketch creation tool that can act both as a front-end visual query module to visual information retrieval systems and as an aid tool for fast image composition. The system allows the user to draw shapes on the ...
Summarizing data using bottom-k sketches
PODC '07: Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing

A Bottom-sketch is a summary of a set of items with nonnegative weights that supports approximate query processing. A sketch is obtained by associating with each item in a ground set an independent random rank drawn from a probability distribution that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
147
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents