Elsevier

Pattern Recognition Letters

Volume 130, February 2020, Pages 73-82
Pattern Recognition Letters

A Hybrid convolutional neural network for sketch recognition

https://doi.org/10.1016/j.patrec.2019.01.006Get rights and content

Highlights

  • We propose a novel Hybrid CNN architecture to address the problem of sketch recognition.

  • S-Net extracts shape features, which is invariant for sketch rotation and transformation.

  • Hybrid CNN achieves state-of-the-art on sketch classification and sketch-based image retrieval tasks.

  • The classification accuracy is 84.42% and 82.74% on TU-Berlin and SketchX datasets, respectively.

  • The MAP of SBIR is 57.4% and 28.74% on SketchX and Flickr15K-Large datasets, respectively.

Abstract

With the popularity of touch-screen devices, it is becoming increasingly important to understand users’ free-hand sketches in computer vision and human-computer interaction. Most of existing sketch recognition methods employ the similar strategies used in image recognition, relying on appearance information represented by hand-crafted features or deep features from convolutional neural networks. We believe that sketch recognition can benefit from learning both appearance and shape representation. In this paper, we propose a novel architecture, named Hybrid CNN, which is composed of A-Net and S-Net. They describe appearance information and shape information, respectively. Hybrid CNN is then comprehensively evaluated in the sketch classification and retrieval tasks on different datasets, including TU-Berlin, Sketchy and Flickr15k. Experimental results demonstrate that the Hybrid CNN achieves competitive accuracy compared with the state-of-the-art methods.

Introduction

Sketching is widely used in daily life, and free-hand sketch is a simple yet powerful tool for communicating, recording and expressing with each other. It has attracted more and more attention to recognize sketches due to the widespread use of touch-screens on portable devices. However, it is much difficult to interpret free-hand sketches automatically. Some of the reasons are that: 1) natural images contain abundant details of color or texture, whereas sketches are highly abstract and only contain quite limited shape information; 2) people may present the same object using very different drawing styles. Thus, it is a great challenge for a computer to achieve robust representation for sketch recognition tasks.

Generally, the existing sketch recognition methods follow the similar strategy with image recognition. The earlier methods use hand-crafted features, such as GF-HOG [1], SIFT [2], Self Similarity (SSIM) [3], HOG [4], Structure Tensor [5] and Fisher Vector [6]. These descriptors are often combined with bag of visual words (BOW) [7], [8] to yield the global features. However, they still have a gap in recognition accuracy compared with human performance.

Recently, benefiting from the deep convolutional neural networks (DCNNs) [9], [10] and large-scale sketch datasets, such as TU-Berlin [11] and SketchX [12], DCNNs are effectively explored for recognizing sketch objects. DCNNs can learn more distinctive features, and thus leverage sketch classification and retrieval performance in comparison with hand-crafted visual features. The first attempt to utilize CNNs for free-hand sketch recognition is the use of two popular CNNs: AlexNet [9] and LeNet [10], and the results of classification on sketch datasets demonstrate a great improvement compared with hand-crafted features. Then, more powerful frameworks are introduced in [13], [14], [15], [16], [17]. The classification performance on TU-Berlin dataset has been improved to 75.42% [13], 77.95% [16] and 80.42% [17] respectively. However, it is still far behind the accuracy of natural RGB image recognition.

The key issue in sketch recognition is to learn distinctive and powerful features. So far, most models only consider the appearance information, e.g., color and texture, while few studies consider the shape information. We believe that sketch recognition can be further improved by considering the features of both appearance and shape. Based on this, we propose a novel convolutional neural network-based architecture, named Hybrid CNN, for sketch recognition in this paper. Hybrid CNN consists of two stream CNNs to extract sketch features. One stream reflects appearance structure and the other stream extracts shape information. The success of our idea depends on the capability of extracting discriminative shape features for each sketch category. For this purpose, we further develop a shape CNN (S-Net), one stream of Hybrid CNN, which transforms one sketch into point set data and performs convolutional operation on it to extract shape features.

In summary, the main contributions of this paper are three folds:

  • We propose a novel Hybrid CNN architecture to address the problem of sketch recognition. Traditional models only consider appearance information, whereas Hybrid CNN considers not only appearance features but also shape features.

  • We develop a point set-based deep neural network, S-Net, to extract shape features of a sketch, which is invariant for sketch rotation and transformation.

  • We conduct comprehensive experiments on two tasks: sketch classification and sketch-based image retrieval. Our proposed two-stream framework achieves superior performance compared with the state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 briefly reviews related literatures. The proposed hybrid CNN is described in Section 3. The experimental results and analysis are presented in Sections 4 and 5. Finally, we summarize our contributions and future works in Section 6.

Section snippets

Related work

Sketch Classification A large number of methods have been proposed for sketch classification in recent decades. These methods generally share the similar idea of image classification. The pipeline usually consists of two steps: feature extraction and classification. First, we generate feature descriptors of the sketch. Then, classifiers are used to predict the class labels. Basically, these methods can be divided into two categories: BOW-based models and deep learning-based models. Eitz et al.

The proposed method

In this section, we illustrate the Hybrid CNN architecture consisting of two branches, and then we give the details of each part.

Sketch classification experiment

In this section, we evaluate our proposed Hybrid CNN on sketch classification task. We first give a description of datasets that are used to verify our method. Then we report the performance and discuss the results in details.

Sketch-based image retrieval experiment

In this section, we show the application of Hybrid CNN on SBIR task.

Conclusion

In this paper, we propose a deep-learning based framework for sketch recognition named of Hybrid CNN. Hybrid CNN obtains efficient and comprehensive representation of sketches, and the shape features leverage accuracy of sketch recognition by 2%–5% over the existing state-of-the-art. Based on the proposed method, we demonstrate state-of-the-art performance on sketch classification and SBIR tasks by TU-Berlin, Sketchy and Flickr15K-Large datasets.

In the future, although deep learning-based

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61273364, 61473031, and 61472029), the Fundamental Research Funds for the Central Universities (2016YJS041, 2018YJS035).

References (62)

  • R. Hu et al.

    Markov random fields for sketch based video retrieval

    ACM Conference on International Conference on Multimedia Retrieval

    (2013)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • E. Shechtman et al.

    Matching local self-similarities across images and videos

    Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on

    (2007)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    IEEE Computer Society Conference on Computer Vision Pattern Recognition

    (2005)
  • M. Eitz et al.

    A descriptor for large scale image retrieval based on sketched feature lines

    Eurographics Symposium on Sketch-Based Interfaces and Modeling

    (2009)
  • T. Tuytelaars

    Sketch classification and classification-driven analysis using fisher vectors

    (2014)
  • T. Joachims

    Text categorization with support vector machines: learning with many relevant features

    European Conference on Machine Learning

    (1998)
  • A. McCallum et al.

    A comparison of event models for naive Bayes text classification

    AAAI-98 Workshop on Learning for Text Categorization

    (1998)
  • A. Krizhevsky et al.

    Imagenet classification with deep convolutional neural networks

    International Conference on Neural Information Processing Systems

    (2012)
  • Y. Lecun et al.

    Gradient-based learning applied to document recognition

    Proc. IEEE

    (1998)
  • M. Eitz et al.

    How do humans sketch objects?

    ACM Trans. Graph.

    (2012)
  • P. Sangkloy et al.

    The sketchy database: learning to retrieve badly drawn bunnies

    ACM Trans. Graph.

    (2016)
  • O. Seddati et al.

    Deepsketch: deep convolutional neural networks for sketch recognition and similarity search

    International Workshop on Content-Based Multimedia Indexing

    (2015)
  • Y. Li et al.

    Sketch recognition by ensemble matching of structured features

    British Machine Vision Conference

    (2013)
  • Y. Li et al.

    Free-hand sketch recognition by multi-kernel feature learning

    Comput. Vis. Image Understanding

    (2015)
  • Q. Yu, Y. Yang, Y.Z. Song, T. Xiang, T. Hospedales, Sketch-a-net that beats humans...
  • H. Zhang et al.

    Sketchnet: sketch classification with web images

    Computer Vision and Pattern Recognition

    (2016)
  • Z. Li et al.

    Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • X. Chang et al.

    Semantic pooling for complex event analysis in untrimmed videos

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • H. Su, S. Maji, E. Kalogerakis, E. Learnedmiller, Multi-view convolutional neural networks for 3d shape recognition...
  • M. Eitz et al.

    Sketch-based image retrieval: benchmark and bag-of-features descriptors

    IEEE Trans. Vis. Comput. Graph.

    (2011)
  • R. Hu et al.

    Gradient field descriptor for sketch based retrieval and localization

    IEEE International Conference on Image Processing

    (2010)
  • L. Zhu et al.

    Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • J.M. Saavedra et al.

    An Improved Histogram of Edge Local Orientations for Sketch-Based Image Retrieval

    (2010)
  • J.M. Saavedra

    Sketch based image retrieval using a soft computation of the histogram of edge local orientations (s-helo)

    IEEE International Conference on Image Processing

    (2015)
  • S. Belongie et al.

    Shape matching and object recognition using shape contexts

    IEEE International Conference on Computer Science and Information Technology

    (2010)
  • Y. Cao et al.

    Edgel index for large-scale sketch-based image search

    Computer Vision and Pattern Recognition

    (2011)
  • Y. Qi et al.

    Making better use of edges via perceptual grouping

    Computer Vision and Pattern Recognition

    (2015)
  • C. Xiao et al.

    Sketch-based image retrieval via shape words

    ACM on International Conference on Multimedia Retrieval

    (2015)
  • Y. Zheng et al.

    Discovering discriminative patches for free-hand sketch analysis

    Multimedia Syst.

    (2017)
  • Y. Qi et al.

    Sketch-based image retrieval via siamese convolutional neural network

    IEEE International Conference on Image Processing

    (2016)
  • Cited by (35)

    • Discriminative shared transform learning for sketch to image matching

      2021, Pattern Recognition
      Citation Excerpt :

      Zhang et al. [19] presented a technique which dynamically discovers landmarks, which aids in learning the discriminative structural representations. Further, Zhang et al. [20] proposed a Hybrid CNN model for modeling the appearance and shape information for sketch based image retrieval. Sketch based image object retrieval has also been addressed by utilizing pre-trained deep learning models with domain-specific information [10,21,22].

    View all citing articles on Scopus

    Conflict of interest: We wish to confirm that there are no known conflicts of interest associated with thispublication and there has been no significant financial support for this work that could haveinfluenced its outcome.

    View full text