A Hybrid convolutional neural network for sketch recognition☆
Introduction
Sketching is widely used in daily life, and free-hand sketch is a simple yet powerful tool for communicating, recording and expressing with each other. It has attracted more and more attention to recognize sketches due to the widespread use of touch-screens on portable devices. However, it is much difficult to interpret free-hand sketches automatically. Some of the reasons are that: 1) natural images contain abundant details of color or texture, whereas sketches are highly abstract and only contain quite limited shape information; 2) people may present the same object using very different drawing styles. Thus, it is a great challenge for a computer to achieve robust representation for sketch recognition tasks.
Generally, the existing sketch recognition methods follow the similar strategy with image recognition. The earlier methods use hand-crafted features, such as GF-HOG [1], SIFT [2], Self Similarity (SSIM) [3], HOG [4], Structure Tensor [5] and Fisher Vector [6]. These descriptors are often combined with bag of visual words (BOW) [7], [8] to yield the global features. However, they still have a gap in recognition accuracy compared with human performance.
Recently, benefiting from the deep convolutional neural networks (DCNNs) [9], [10] and large-scale sketch datasets, such as TU-Berlin [11] and SketchX [12], DCNNs are effectively explored for recognizing sketch objects. DCNNs can learn more distinctive features, and thus leverage sketch classification and retrieval performance in comparison with hand-crafted visual features. The first attempt to utilize CNNs for free-hand sketch recognition is the use of two popular CNNs: AlexNet [9] and LeNet [10], and the results of classification on sketch datasets demonstrate a great improvement compared with hand-crafted features. Then, more powerful frameworks are introduced in [13], [14], [15], [16], [17]. The classification performance on TU-Berlin dataset has been improved to 75.42% [13], 77.95% [16] and 80.42% [17] respectively. However, it is still far behind the accuracy of natural RGB image recognition.
The key issue in sketch recognition is to learn distinctive and powerful features. So far, most models only consider the appearance information, e.g., color and texture, while few studies consider the shape information. We believe that sketch recognition can be further improved by considering the features of both appearance and shape. Based on this, we propose a novel convolutional neural network-based architecture, named Hybrid CNN, for sketch recognition in this paper. Hybrid CNN consists of two stream CNNs to extract sketch features. One stream reflects appearance structure and the other stream extracts shape information. The success of our idea depends on the capability of extracting discriminative shape features for each sketch category. For this purpose, we further develop a shape CNN (S-Net), one stream of Hybrid CNN, which transforms one sketch into point set data and performs convolutional operation on it to extract shape features.
In summary, the main contributions of this paper are three folds:
- •
We propose a novel Hybrid CNN architecture to address the problem of sketch recognition. Traditional models only consider appearance information, whereas Hybrid CNN considers not only appearance features but also shape features.
- •
We develop a point set-based deep neural network, S-Net, to extract shape features of a sketch, which is invariant for sketch rotation and transformation.
- •
We conduct comprehensive experiments on two tasks: sketch classification and sketch-based image retrieval. Our proposed two-stream framework achieves superior performance compared with the state-of-the-art methods.
The remainder of this paper is organized as follows. Section 2 briefly reviews related literatures. The proposed hybrid CNN is described in Section 3. The experimental results and analysis are presented in Sections 4 and 5. Finally, we summarize our contributions and future works in Section 6.
Section snippets
Related work
Sketch Classification A large number of methods have been proposed for sketch classification in recent decades. These methods generally share the similar idea of image classification. The pipeline usually consists of two steps: feature extraction and classification. First, we generate feature descriptors of the sketch. Then, classifiers are used to predict the class labels. Basically, these methods can be divided into two categories: BOW-based models and deep learning-based models. Eitz et al.
The proposed method
In this section, we illustrate the Hybrid CNN architecture consisting of two branches, and then we give the details of each part.
Sketch classification experiment
In this section, we evaluate our proposed Hybrid CNN on sketch classification task. We first give a description of datasets that are used to verify our method. Then we report the performance and discuss the results in details.
Sketch-based image retrieval experiment
In this section, we show the application of Hybrid CNN on SBIR task.
Conclusion
In this paper, we propose a deep-learning based framework for sketch recognition named of Hybrid CNN. Hybrid CNN obtains efficient and comprehensive representation of sketches, and the shape features leverage accuracy of sketch recognition by 2%–5% over the existing state-of-the-art. Based on the proposed method, we demonstrate state-of-the-art performance on sketch classification and SBIR tasks by TU-Berlin, Sketchy and Flickr15K-Large datasets.
In the future, although deep learning-based
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61273364, 61473031, and 61472029), the Fundamental Research Funds for the Central Universities (2016YJS041, 2018YJS035).
References (62)
- et al.
Markov random fields for sketch based video retrieval
ACM Conference on International Conference on Multimedia Retrieval
(2013) Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)- et al.
Matching local self-similarities across images and videos
Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on
(2007) - et al.
Histograms of oriented gradients for human detection
IEEE Computer Society Conference on Computer Vision Pattern Recognition
(2005) - et al.
A descriptor for large scale image retrieval based on sketched feature lines
Eurographics Symposium on Sketch-Based Interfaces and Modeling
(2009) Sketch classification and classification-driven analysis using fisher vectors
(2014)Text categorization with support vector machines: learning with many relevant features
European Conference on Machine Learning
(1998)- et al.
A comparison of event models for naive Bayes text classification
AAAI-98 Workshop on Learning for Text Categorization
(1998) - et al.
Imagenet classification with deep convolutional neural networks
International Conference on Neural Information Processing Systems
(2012) - et al.
Gradient-based learning applied to document recognition
Proc. IEEE
(1998)
How do humans sketch objects?
ACM Trans. Graph.
The sketchy database: learning to retrieve badly drawn bunnies
ACM Trans. Graph.
Deepsketch: deep convolutional neural networks for sketch recognition and similarity search
International Workshop on Content-Based Multimedia Indexing
Sketch recognition by ensemble matching of structured features
British Machine Vision Conference
Free-hand sketch recognition by multi-kernel feature learning
Comput. Vis. Image Understanding
Sketchnet: sketch classification with web images
Computer Vision and Pattern Recognition
Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis
IEEE Trans. Knowl. Data Eng.
Semantic pooling for complex event analysis in untrimmed videos
IEEE Trans. Pattern Anal. Mach. Intell.
Sketch-based image retrieval: benchmark and bag-of-features descriptors
IEEE Trans. Vis. Comput. Graph.
Gradient field descriptor for sketch based retrieval and localization
IEEE International Conference on Image Processing
Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval
IEEE Trans. Neural Netw. Learn. Syst.
An Improved Histogram of Edge Local Orientations for Sketch-Based Image Retrieval
Sketch based image retrieval using a soft computation of the histogram of edge local orientations (s-helo)
IEEE International Conference on Image Processing
Shape matching and object recognition using shape contexts
IEEE International Conference on Computer Science and Information Technology
Edgel index for large-scale sketch-based image search
Computer Vision and Pattern Recognition
Making better use of edges via perceptual grouping
Computer Vision and Pattern Recognition
Sketch-based image retrieval via shape words
ACM on International Conference on Multimedia Retrieval
Discovering discriminative patches for free-hand sketch analysis
Multimedia Syst.
Sketch-based image retrieval via siamese convolutional neural network
IEEE International Conference on Image Processing
Cited by (35)
Cross-Modal Pixel-and-Stroke representation aligning networks for free-hand sketch recognition
2024, Expert Systems with ApplicationsSketch-specific data augmentation for freehand sketch recognition
2021, NeurocomputingDiscriminative shared transform learning for sketch to image matching
2021, Pattern RecognitionCitation Excerpt :Zhang et al. [19] presented a technique which dynamically discovers landmarks, which aids in learning the discriminative structural representations. Further, Zhang et al. [20] proposed a Hybrid CNN model for modeling the appearance and shape information for sketch based image retrieval. Sketch based image object retrieval has also been addressed by utilizing pre-trained deep learning models with domain-specific information [10,21,22].
- ☆
Conflict of interest: We wish to confirm that there are no known conflicts of interest associated with thispublication and there has been no significant financial support for this work that could haveinfluenced its outcome.