research-article

Sketch Recognition with Deep Visual-Sequential Fusion Model

Authors:

Qiang PengAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 448 - 456

https://doi.org/10.1145/3123266.3123321

Published: 19 October 2017 Publication History

Abstract

In this paper, a deep end-to-end network for sketch recognition, named Deep Visual-Sequential Fusion model (DVSF) is proposed to model the visual and sequential patterns of the strokes. To capture the intermediate states of sketches, a three-way representation learner is first utilized to extract the visual features. These deep features are simultaneously fed into the visual and sequential networks to capture spatial and temporal properties, respectively. More specifically, visual networks are novelly proposed to learn the stroke patterns by stacking the Residual Fully-Connected (R-FC) layers, which integrate ReLU and Tanh activation functions to achieve the sparsity and generalization ability. To learn the patterns of stroke order, sequential networks are constructed by Residual Long Short-Term Memory (R-LSTM) units, which optimize the network architecture by skip connection. Finally, the visual and sequential representations of the sketches are seamlessly integrated with a fusion layer to obtain the final results. Experiments conducted on the benchmark sketch dataset TU-Berlin demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art approaches.

References

[1]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proc. IEEE Int. Conf. Comput. Vis. 2425--2433.

Digital Library

[2]

Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, and Liang Lin. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor Proc. IEEE Euro. Conf. Comput. Vis. 313--320.

Digital Library

[3]

Yang Cao, Hai Wang, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang. 2010. Mindfinder: interactive sketch-based image search on millions of images in Proc. ACM Conf. Multimedia. 1605--1608.

Digital Library

[4]

Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: internet image montage. ACM Trans. Graphics, Vol. 28, 5 (2009), 124:1--124:10.

Digital Library

[5]

Zhi-Qi Cheng, Xiao Wu, Yang Liu, and Xian-Sheng Hua. 2017. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images Proc. IEEE Conf. on Comput. Vis. and Pattern Recog.

[6]

Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Trans. Graphics, Vol. 31, 4 (2012), 44--1.

Digital Library

[7]

Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question Adv. Neural Inf. Process. Syst. Workshop on Statistical Machine Translation. 2296--2304.

Digital Library

[8]

Yaugmur Gücclütürk, Umut Gücclü, Rob van Lier, and Marcel A. J. van Gerven. 2016. Convolutional Sketch Inversion. In ICCV Workshop on LSMDC. 810--824.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 770--778.

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[11]

Hamed Kiani Galoogahi and Terence Sim. 2012. Photo Retrieval by Sketch Example. In Proc. ACM. Conf. on Multimedia. 949--952.

Digital Library

[12]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Tech Report (2009).

[13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks Proc. Adv. Neural Inf. Process. Syst. 1097--1105.

Digital Library

[14]

Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition Proc. IEEE Conf. Comput. Vis. Pattern Recog., Vol. Vol. 1. 1005--1010.

Digital Library

[15]

Chao Ma, Xiaokang Yang, Chongyang Zhang, Xiang Ruan, Ming-Hsuan Yang, and Omron Coporation. 2013. Sketch Retrieval via Dense Stroke Features. In Proc. British Mach. Vis. Conf. 64--73.

Digital Library

[16]

Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning Activity Progression in LSTMs for Activity Detection and Early Detection Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1942--1950.

[17]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.

[18]

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016).

[19]

Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, and Yueting Zhuang. 2016. Hierarchical Recurrent Neural Encoder for Video Representation With Application to Captioning Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1029--1038.

[20]

Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Exploring models and data for image question answering Proc. Adv. Neural Inf. Process. Syst. 2953--2961.

Digital Library

[21]

Rosália G Schneider and Tinne Tuytelaars. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graphics, Vol. 33, 6 (2014), 174.

Digital Library

[22]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition Proc. Int. Conf. Learning Representations.

[23]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs Proc. Int. Conf. on Machine Learning. 843--852.

Digital Library

[24]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1--9.

[25]

Xiaoou Tang and Xiaogang Wang. 2003. Face sketch synthesis and recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 687--694.

Digital Library

[26]

Xiaoou Tang and Xiaogang Wang. 2004. Face sketch recognition. IEEE Trans. Circuits Syst. Video Technol. Vol. 14, 1 (2004), 50--57.

Digital Library

[27]

Fang Wang, Le Kang, and Yi Li. 2015. Sketch-based 3d shape retrieval using convolutional neural networks Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1875--1883.

[28]

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification Proc. ACM Conf. Multimedia. 461--470.

Digital Library

[29]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering Proc. IEEE Conf. Comput. Vis. Pattern Recog. 21--29.

[30]

Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch Me That Shoe Proc. IEEE Conf. Comput. Vis. Pattern Recog. 799--807.

[31]

Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2015. Sketch-a-Net that Beats Humans. In Proc. British Mach. Vis. Conf. 7.1--7.12.

[32]

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification Proc. IEEE Conf. Comput. Vis. Pattern Recog. 4694--4702.

[33]

Sergey Zagoruyko and Nikos Komodakis. 2016. Introduction to Wide Residual Networks. In Proc. British Mach. Vis. Conf.

[34]

Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. SketchNet: sketch classification with web images. Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1105--1113.

[35]

Wei Zhang, Xiaogang Wang, and Xiaoou Tang. 2011. Coupled information-theoretic encoding for face photo-sketch recognition Proc. IEEE Conf. Comput. Vis. Pattern Recog. 513--520.

Digital Library

Cited By

Yang BWang CMa XSong BLiu ZSun F(2024)Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided TokenizationRemote Sensing10.3390/rs1610165316:10(1653)Online publication date: 7-May-2024
https://doi.org/10.3390/rs16101653
Yang FIsmail NPang YKebande VAl-Dhaqm AKoh T(2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357939
Sun XQin J(2024)AI for Supporting the Freedom of DrawingMachine Intelligence Research10.1007/s11633-023-1438-421:1(63-88)Online publication date: 15-Jan-2024
https://doi.org/10.1007/s11633-023-1438-4
Show More Cited By

Index Terms

Sketch Recognition with Deep Visual-Sequential Fusion Model
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Deep ResNet Based Remote Sensing Image Super-Resolution Reconstruction in Discrete Wavelet Domain
Abstract
We present a single-image super-resolution (SR) method for Remote Sensing Image based on deep learning within Discrete Wavelet Domain in this paper. Our method is inspired Residual Learning. Firstly, an input image is decomposed by single level 2D ...
Multi-focus image fusion with deep residual learning and focus property detection
Abstract
Multi-focus image fusion methods can be mainly divided into two categories: transform domain methods and spatial domain methods. Recent emerged deep learning (DL)-based methods actually satisfy this taxonomy as well. In this paper, we ...
Highlights
- We propose a DL-based multi-focus image fusion framework that combines the advantages of TD and SD methods.
Deep sketch feature for cross-domain image retrieval

Deep learning has been proven to be very effective for various image recognition tasks, e.g., image classification, semantic segmentation, image retrieval, shape classification, etc. However, existing works on deep learning for image recognition mainly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Program for Sichuan Provincial Science Fund for Distinguished Young Scholars

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
465
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang BWang CMa XSong BLiu ZSun F(2024)Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided TokenizationRemote Sensing10.3390/rs1610165316:10(1653)Online publication date: 7-May-2024
https://doi.org/10.3390/rs16101653
Yang FIsmail NPang YKebande VAl-Dhaqm AKoh T(2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357939
Sun XQin J(2024)AI for Supporting the Freedom of DrawingMachine Intelligence Research10.1007/s11633-023-1438-421:1(63-88)Online publication date: 15-Jan-2024
https://doi.org/10.1007/s11633-023-1438-4
Zhang SWang LCui ZWang S(2024)A sketch recognition method based on bi-modal model using cooperative learning paradigmNeural Computing and Applications10.1007/s00521-024-09836-236:23(14275-14290)Online publication date: 6-May-2024
https://doi.org/10.1007/s00521-024-09836-2
Wu MKang YLi XHu SChen XKang YWang WHuang K(2024)VS-LLM: Visual-Semantic Depression Assessment Based on LLM for Drawing Projection TestPattern Recognition and Computer Vision10.1007/978-981-97-8692-3_17(232-246)Online publication date: 1-Nov-2024
https://doi.org/10.1007/978-981-97-8692-3_17
Beltzung BPelé MRenoult JSueur C(2023)Deep learning for studying drawing behavior: A reviewFrontiers in Psychology10.3389/fpsyg.2023.99254114Online publication date: 8-Feb-2023
https://doi.org/10.3389/fpsyg.2023.992541
Zhu GWang SCheng QWu KLi HZhang LEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612115(1055-1065)Online publication date: 27-Oct-2023
https://doi.org/10.1145/3581783.3612115
Xu PHospedales TYin QSong YXiang TWang L(2023)Deep Learning for Free-Hand Sketch: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314885345:1(285-312)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPAMI.2022.3148853
Kong LZhou WPei DHe ZHuang D(2023)Group Activity Representation Learning With Long-Short States Predictive TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327898433:12(7267-7281)Online publication date: Dec-2023
https://doi.org/10.1109/TCSVT.2023.3278984
Gülez TSert M(2023)Self-Supervised Learning of Free-Hand Sketches with Bézier Curve Features2023 IEEE International Symposium on Multimedia (ISM)10.1109/ISM59092.2023.00030(168-171)Online publication date: 11-Dec-2023
https://doi.org/10.1109/ISM59092.2023.00030
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten