skip to main content
10.1145/3123266.3123321acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Sketch Recognition with Deep Visual-Sequential Fusion Model

Published: 19 October 2017 Publication History

Abstract

In this paper, a deep end-to-end network for sketch recognition, named Deep Visual-Sequential Fusion model (DVSF) is proposed to model the visual and sequential patterns of the strokes. To capture the intermediate states of sketches, a three-way representation learner is first utilized to extract the visual features. These deep features are simultaneously fed into the visual and sequential networks to capture spatial and temporal properties, respectively. More specifically, visual networks are novelly proposed to learn the stroke patterns by stacking the Residual Fully-Connected (R-FC) layers, which integrate ReLU and Tanh activation functions to achieve the sparsity and generalization ability. To learn the patterns of stroke order, sequential networks are constructed by Residual Long Short-Term Memory (R-LSTM) units, which optimize the network architecture by skip connection. Finally, the visual and sequential representations of the sketches are seamlessly integrated with a fusion layer to obtain the final results. Experiments conducted on the benchmark sketch dataset TU-Berlin demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art approaches.

References

[1]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proc. IEEE Int. Conf. Comput. Vis. 2425--2433.
[2]
Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, and Liang Lin. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor Proc. IEEE Euro. Conf. Comput. Vis. 313--320.
[3]
Yang Cao, Hai Wang, Changhu Wang, Zhiwei Li, Liqing Zhang, and Lei Zhang. 2010. Mindfinder: interactive sketch-based image search on millions of images in Proc. ACM Conf. Multimedia. 1605--1608.
[4]
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: internet image montage. ACM Trans. Graphics, Vol. 28, 5 (2009), 124:1--124:10.
[5]
Zhi-Qi Cheng, Xiao Wu, Yang Liu, and Xian-Sheng Hua. 2017. Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images Proc. IEEE Conf. on Comput. Vis. and Pattern Recog.
[6]
Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Trans. Graphics, Vol. 31, 4 (2012), 44--1.
[7]
Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question Adv. Neural Inf. Process. Syst. Workshop on Statistical Machine Translation. 2296--2304.
[8]
Yaugmur Gücclütürk, Umut Gücclü, Rob van Lier, and Marcel A. J. van Gerven. 2016. Convolutional Sketch Inversion. In ICCV Workshop on LSMDC. 810--824.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 770--778.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[11]
Hamed Kiani Galoogahi and Terence Sim. 2012. Photo Retrieval by Sketch Example. In Proc. ACM. Conf. on Multimedia. 949--952.
[12]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Tech Report (2009).
[13]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks Proc. Adv. Neural Inf. Process. Syst. 1097--1105.
[14]
Qingshan Liu, Xiaoou Tang, Hongliang Jin, Hanqing Lu, and Songde Ma. 2005. A nonlinear approach for face sketch synthesis and recognition Proc. IEEE Conf. Comput. Vis. Pattern Recog., Vol. Vol. 1. 1005--1010.
[15]
Chao Ma, Xiaokang Yang, Chongyang Zhang, Xiang Ruan, Ming-Hsuan Yang, and Omron Coporation. 2013. Sketch Retrieval via Dense Stroke Features. In Proc. British Mach. Vis. Conf. 64--73.
[16]
Shugao Ma, Leonid Sigal, and Stan Sclaroff. 2016. Learning Activity Progression in LSTMs for Activity Detection and Early Detection Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1942--1950.
[17]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.
[18]
Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 (2016).
[19]
Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, and Yueting Zhuang. 2016. Hierarchical Recurrent Neural Encoder for Video Representation With Application to Captioning Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1029--1038.
[20]
Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Exploring models and data for image question answering Proc. Adv. Neural Inf. Process. Syst. 2953--2961.
[21]
Rosália G Schneider and Tinne Tuytelaars. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graphics, Vol. 33, 6 (2014), 174.
[22]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition Proc. Int. Conf. Learning Representations.
[23]
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised Learning of Video Representations using LSTMs Proc. Int. Conf. on Machine Learning. 843--852.
[24]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1--9.
[25]
Xiaoou Tang and Xiaogang Wang. 2003. Face sketch synthesis and recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recog. 687--694.
[26]
Xiaoou Tang and Xiaogang Wang. 2004. Face sketch recognition. IEEE Trans. Circuits Syst. Video Technol. Vol. 14, 1 (2004), 50--57.
[27]
Fang Wang, Le Kang, and Yi Li. 2015. Sketch-based 3d shape retrieval using convolutional neural networks Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1875--1883.
[28]
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification Proc. ACM Conf. Multimedia. 461--470.
[29]
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering Proc. IEEE Conf. Comput. Vis. Pattern Recog. 21--29.
[30]
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch Me That Shoe Proc. IEEE Conf. Comput. Vis. Pattern Recog. 799--807.
[31]
Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2015. Sketch-a-Net that Beats Humans. In Proc. British Mach. Vis. Conf. 7.1--7.12.
[32]
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification Proc. IEEE Conf. Comput. Vis. Pattern Recog. 4694--4702.
[33]
Sergey Zagoruyko and Nikos Komodakis. 2016. Introduction to Wide Residual Networks. In Proc. British Mach. Vis. Conf.
[34]
Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. SketchNet: sketch classification with web images. Proc. IEEE Conf. Comput. Vis. Pattern Recog. 1105--1113.
[35]
Wei Zhang, Xiaogang Wang, and Xiaoou Tang. 2011. Coupled information-theoretic encoding for face photo-sketch recognition Proc. IEEE Conf. Comput. Vis. Pattern Recog. 513--520.

Cited By

View all
  • (2024)Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided TokenizationRemote Sensing10.3390/rs1610165316:10(1653)Online publication date: 7-May-2024
  • (2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
  • (2024)AI for Supporting the Freedom of DrawingMachine Intelligence Research10.1007/s11633-023-1438-421:1(63-88)Online publication date: 15-Jan-2024
  • Show More Cited By

Index Terms

  1. Sketch Recognition with Deep Visual-Sequential Fusion Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '17: Proceedings of the 25th ACM international conference on Multimedia
    October 2017
    2028 pages
    ISBN:9781450349062
    DOI:10.1145/3123266
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. long short-term memory
    3. residual learning
    4. sketch recognition

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '17
    Sponsor:
    MM '17: ACM Multimedia Conference
    October 23 - 27, 2017
    California, Mountain View, USA

    Acceptance Rates

    MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Zero-Shot Sketch-Based Remote-Sensing Image Retrieval Based on Multi-Level and Attention-Guided TokenizationRemote Sensing10.3390/rs1610165316:10(1653)Online publication date: 7-May-2024
    • (2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
    • (2024)AI for Supporting the Freedom of DrawingMachine Intelligence Research10.1007/s11633-023-1438-421:1(63-88)Online publication date: 15-Jan-2024
    • (2024)A sketch recognition method based on bi-modal model using cooperative learning paradigmNeural Computing and Applications10.1007/s00521-024-09836-236:23(14275-14290)Online publication date: 6-May-2024
    • (2024)VS-LLM: Visual-Semantic Depression Assessment Based on LLM for Drawing Projection TestPattern Recognition and Computer Vision10.1007/978-981-97-8692-3_17(232-246)Online publication date: 1-Nov-2024
    • (2023)Deep learning for studying drawing behavior: A reviewFrontiers in Psychology10.3389/fpsyg.2023.99254114Online publication date: 8-Feb-2023
    • (2023)Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612115(1055-1065)Online publication date: 27-Oct-2023
    • (2023)Deep Learning for Free-Hand Sketch: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314885345:1(285-312)Online publication date: 1-Jan-2023
    • (2023)Group Activity Representation Learning With Long-Short States Predictive TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327898433:12(7267-7281)Online publication date: Dec-2023
    • (2023)Self-Supervised Learning of Free-Hand Sketches with Bézier Curve Features2023 IEEE International Symposium on Multimedia (ISM)10.1109/ISM59092.2023.00030(168-171)Online publication date: 11-Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media