A Hybrid convolutional neural network for sketch recognition

doi:10.1016/j.patrec.2019.01.006

Pattern Recognition Letters

Volume 130, February 2020, Pages 73-82

https://doi.org/10.1016/j.patrec.2019.01.006 Get rights and content

Highlights

•
We propose a novel Hybrid CNN architecture to address the problem of sketch recognition.
•
S-Net extracts shape features, which is invariant for sketch rotation and transformation.
•
Hybrid CNN achieves state-of-the-art on sketch classification and sketch-based image retrieval tasks.
•
The classification accuracy is 84.42% and 82.74% on TU-Berlin and SketchX datasets, respectively.
•
The MAP of SBIR is 57.4% and 28.74% on SketchX and Flickr15K-Large datasets, respectively.

Abstract

With the popularity of touch-screen devices, it is becoming increasingly important to understand users’ free-hand sketches in computer vision and human-computer interaction. Most of existing sketch recognition methods employ the similar strategies used in image recognition, relying on appearance information represented by hand-crafted features or deep features from convolutional neural networks. We believe that sketch recognition can benefit from learning both appearance and shape representation. In this paper, we propose a novel architecture, named Hybrid CNN, which is composed of A-Net and S-Net. They describe appearance information and shape information, respectively. Hybrid CNN is then comprehensively evaluated in the sketch classification and retrieval tasks on different datasets, including TU-Berlin, Sketchy and Flickr15k. Experimental results demonstrate that the Hybrid CNN achieves competitive accuracy compared with the state-of-the-art methods.

Introduction

Sketching is widely used in daily life, and free-hand sketch is a simple yet powerful tool for communicating, recording and expressing with each other. It has attracted more and more attention to recognize sketches due to the widespread use of touch-screens on portable devices. However, it is much difficult to interpret free-hand sketches automatically. Some of the reasons are that: 1) natural images contain abundant details of color or texture, whereas sketches are highly abstract and only contain quite limited shape information; 2) people may present the same object using very different drawing styles. Thus, it is a great challenge for a computer to achieve robust representation for sketch recognition tasks.

Generally, the existing sketch recognition methods follow the similar strategy with image recognition. The earlier methods use hand-crafted features, such as GF-HOG [1], SIFT [2], Self Similarity (SSIM) [3], HOG [4], Structure Tensor [5] and Fisher Vector [6]. These descriptors are often combined with bag of visual words (BOW) [7], [8] to yield the global features. However, they still have a gap in recognition accuracy compared with human performance.

Recently, benefiting from the deep convolutional neural networks (DCNNs) [9], [10] and large-scale sketch datasets, such as TU-Berlin [11] and SketchX [12], DCNNs are effectively explored for recognizing sketch objects. DCNNs can learn more distinctive features, and thus leverage sketch classification and retrieval performance in comparison with hand-crafted visual features. The first attempt to utilize CNNs for free-hand sketch recognition is the use of two popular CNNs: AlexNet [9] and LeNet [10], and the results of classification on sketch datasets demonstrate a great improvement compared with hand-crafted features. Then, more powerful frameworks are introduced in [13], [14], [15], [16], [17]. The classification performance on TU-Berlin dataset has been improved to 75.42% [13], 77.95% [16] and 80.42% [17] respectively. However, it is still far behind the accuracy of natural RGB image recognition.

The key issue in sketch recognition is to learn distinctive and powerful features. So far, most models only consider the appearance information, e.g., color and texture, while few studies consider the shape information. We believe that sketch recognition can be further improved by considering the features of both appearance and shape. Based on this, we propose a novel convolutional neural network-based architecture, named Hybrid CNN, for sketch recognition in this paper. Hybrid CNN consists of two stream CNNs to extract sketch features. One stream reflects appearance structure and the other stream extracts shape information. The success of our idea depends on the capability of extracting discriminative shape features for each sketch category. For this purpose, we further develop a shape CNN (S-Net), one stream of Hybrid CNN, which transforms one sketch into point set data and performs convolutional operation on it to extract shape features.

In summary, the main contributions of this paper are three folds:

•
We propose a novel Hybrid CNN architecture to address the problem of sketch recognition. Traditional models only consider appearance information, whereas Hybrid CNN considers not only appearance features but also shape features.
•
We develop a point set-based deep neural network, S-Net, to extract shape features of a sketch, which is invariant for sketch rotation and transformation.
•
We conduct comprehensive experiments on two tasks: sketch classification and sketch-based image retrieval. Our proposed two-stream framework achieves superior performance compared with the state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 briefly reviews related literatures. The proposed hybrid CNN is described in Section 3. The experimental results and analysis are presented in Sections 4 and 5. Finally, we summarize our contributions and future works in Section 6.

Section snippets

Related work

Sketch Classification A large number of methods have been proposed for sketch classification in recent decades. These methods generally share the similar idea of image classification. The pipeline usually consists of two steps: feature extraction and classification. First, we generate feature descriptors of the sketch. Then, classifiers are used to predict the class labels. Basically, these methods can be divided into two categories: BOW-based models and deep learning-based models. Eitz et al.

The proposed method

In this section, we illustrate the Hybrid CNN architecture consisting of two branches, and then we give the details of each part.

Sketch classification experiment

In this section, we evaluate our proposed Hybrid CNN on sketch classification task. We first give a description of datasets that are used to verify our method. Then we report the performance and discuss the results in details.

Sketch-based image retrieval experiment

In this section, we show the application of Hybrid CNN on SBIR task.

Conclusion

In this paper, we propose a deep-learning based framework for sketch recognition named of Hybrid CNN. Hybrid CNN obtains efficient and comprehensive representation of sketches, and the shape features leverage accuracy of sketch recognition by 2%–5% over the existing state-of-the-art. Based on the proposed method, we demonstrate state-of-the-art performance on sketch classification and SBIR tasks by TU-Berlin, Sketchy and Flickr15K-Large datasets.

In the future, although deep learning-based

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61273364, 61473031, and 61472029), the Fundamental Research Funds for the Central Universities (2016YJS041, 2018YJS035).

References (62)

R. Hu et al.
Markov random fields for sketch based video retrieval
ACM Conference on International Conference on Multimedia Retrieval
(2013)
D.G. Lowe
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)
E. Shechtman et al.
Matching local self-similarities across images and videos
Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on
(2007)
N. Dalal et al.
Histograms of oriented gradients for human detection
IEEE Computer Society Conference on Computer Vision Pattern Recognition
(2005)
M. Eitz et al.
A descriptor for large scale image retrieval based on sketched feature lines
Eurographics Symposium on Sketch-Based Interfaces and Modeling
(2009)
T. Tuytelaars
Sketch classification and classification-driven analysis using fisher vectors
(2014)
T. Joachims
Text categorization with support vector machines: learning with many relevant features
European Conference on Machine Learning
(1998)
A. McCallum et al.
A comparison of event models for naive Bayes text classification
AAAI-98 Workshop on Learning for Text Categorization
(1998)
A. Krizhevsky et al.
Imagenet classification with deep convolutional neural networks
International Conference on Neural Information Processing Systems
(2012)
Y. Lecun et al.
Gradient-based learning applied to document recognition
Proc. IEEE
(1998)

M. Eitz et al.

How do humans sketch objects?

ACM Trans. Graph.

(2012)

P. Sangkloy et al.

The sketchy database: learning to retrieve badly drawn bunnies

ACM Trans. Graph.

(2016)

O. Seddati et al.

Deepsketch: deep convolutional neural networks for sketch recognition and similarity search

International Workshop on Content-Based Multimedia Indexing

(2015)

Y. Li et al.

Sketch recognition by ensemble matching of structured features

British Machine Vision Conference

(2013)

Y. Li et al.

Free-hand sketch recognition by multi-kernel feature learning

Comput. Vis. Image Understanding

(2015)

Q. Yu, Y. Yang, Y.Z. Song, T. Xiang, T. Hospedales, Sketch-a-net that beats humans...

H. Zhang et al.

Sketchnet: sketch classification with web images

Computer Vision and Pattern Recognition

(2016)

Z. Li et al.

Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis

IEEE Trans. Knowl. Data Eng.

(2017)

X. Chang et al.

Semantic pooling for complex event analysis in untrimmed videos

IEEE Trans. Pattern Anal. Mach. Intell.

(2017)

H. Su, S. Maji, E. Kalogerakis, E. Learnedmiller, Multi-view convolutional neural networks for 3d shape recognition...

M. Eitz et al.

Sketch-based image retrieval: benchmark and bag-of-features descriptors

IEEE Trans. Vis. Comput. Graph.

(2011)

R. Hu et al.

Gradient field descriptor for sketch based retrieval and localization

IEEE International Conference on Image Processing

(2010)

L. Zhu et al.

Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval

IEEE Trans. Neural Netw. Learn. Syst.

(2018)

J.M. Saavedra et al.

An Improved Histogram of Edge Local Orientations for Sketch-Based Image Retrieval

(2010)

J.M. Saavedra

Sketch based image retrieval using a soft computation of the histogram of edge local orientations (s-helo)

IEEE International Conference on Image Processing

(2015)

S. Belongie et al.

Shape matching and object recognition using shape contexts

IEEE International Conference on Computer Science and Information Technology

(2010)

Y. Cao et al.

Edgel index for large-scale sketch-based image search

Computer Vision and Pattern Recognition

(2011)

Y. Qi et al.

Making better use of edges via perceptual grouping

Computer Vision and Pattern Recognition

(2015)

C. Xiao et al.

Sketch-based image retrieval via shape words

ACM on International Conference on Multimedia Retrieval

(2015)

Y. Zheng et al.

Discovering discriminative patches for free-hand sketch analysis

Multimedia Syst.

(2017)

Y. Qi et al.

Sketch-based image retrieval via siamese convolutional neural network

IEEE International Conference on Image Processing

(2016)

Cited by (35)

Cross-Modal Pixel-and-Stroke representation aligning networks for free-hand sketch recognition
2024, Expert Systems with Applications
We consider the cross-modal alignment problem for free-hand sketch. Given a sequence of stroke and a rasterized image, the objective is to enhance the performance of sketch recognition through cross-modal interactions. Existing works mostly employ simple weighted adding and concatenation for late fusion, or shallow attention layers for cross-modal alignment. Due to the high heterogeneity between sketch modalities, these methods do not capture meaningful feature representations sufficiently. In this paper, we propose a sketch recognition framework CMPS for aligning Cross-Modal Pixel-and-Stroke representation, which includes novel components, namely the Semantic-Temporal Alignment Rasterization (STAR) and Pixel-Stroke Alignment (PSA) module. STAR aligns stroke with image at the semantic and temporal levels during the rasterization preprocessing phase by utilizing color variations in the RGB space for sketch. PSA, through its pre-alignment and post-alignment, learns how to align semantic connections at both pixel and stroke levels, capturing cross-modal dependencies, rather than relying on shallow matrix operations for interaction. Additionally, we introduce a concise stroke processing network called StrokeFormer. It extracts two hierarchical features, i.e., point-level and stroke-level, based on the formation mechanism of sketch. StrokeFormer outperforms most RNN-based and CNN-based models by a significant margin. Our experimental results demonstrate that proposed CMPS achieves new state-of-the-art performance on the Google QuickDraw-414 K dataset and TU-Berlin dataset. The code is available at https://github.com/WoodratTradeCo/CMPS.
Detection of activities in bathrooms through deep learning and environmental data graphics images
2024, Heliyon
Automatic detection activities in indoor spaces has been and is a matter of great interest. Thus, in the field of health surveillance, one of the spaces frequently studied is the bathroom of homes and specifically the behaviour of users in the said space, since certain pathologies can sometimes be deduced from it. That is why, the objective of this study is to know if it is possible to automatically classify the main activities that occur within the bathroom, using an innovative methodology with respect to the methods used to date, based on environmental parameters and the application of machine learning algorithms, thus allowing privacy to be preserved, which is a notable improvement in relation to other methods. For this, the methodology followed is based on the novel application of a pre-trained convolutional network for classifying graphs resulting from the monitoring of the environmental parameters of a bathroom. The results obtained allow us to conclude that, in addition to being able to check whether environmental data are adequate for health, it is possible to detect a high rate of true positives (around 80%) in some of the most frequent and important activities, thus facilitating its automation in a very simple and economical way.
Sketch-specific data augmentation for freehand sketch recognition
2021, Neurocomputing
Sketch recognition remains a significant challenge due to the limited training data and the substantial intra-class variance of freehand sketches for the same object. Conventional methods for this task often rely on the availability of the temporal order of sketch strokes, additional cues acquired from different modalities and supervised augmentation of sketch datasets with real images, which also limit the applicability and feasibility of these methods in real scenarios.
In this paper, we propose a novel sketch-specific data augmentation (SSDA) method that leverages the quantity and quality of the sketches automatically. From the aspect of quantity, we introduce a Bezier pivot based deformation (BPD) strategy to enrich the training data. Towards quality improvement, we present a mean stroke reconstruction (MSR) approach to generate a set of novel types of sketches with smaller intra-class variances. Both of these solutions are unrestricted from any multi-source data and temporal cues of sketches. Furthermore, we show that some recent deep convolutional neural network models that are trained on generic classes of real images can be better choices than most of the elaborate architectures that are designed explicitly for sketch recognition. As SSDA can be integrated with any convolutional neural networks, it has a distinct advantage over the existing methods. Our extensive experimental evaluations demonstrate that the proposed method achieves the state-of-the-art results (84.27%) on the TU-Berlin dataset, outperforming the human performance by a remarkable 11.17% increase. Finally, more experiments show the practical value of our approach for the task of sketch-based image retrieval.
Discriminative shared transform learning for sketch to image matching
2021, Pattern Recognition
Citation Excerpt :
Zhang et al. [19] presented a technique which dynamically discovers landmarks, which aids in learning the discriminative structural representations. Further, Zhang et al. [20] proposed a Hybrid CNN model for modeling the appearance and shape information for sketch based image retrieval. Sketch based image object retrieval has also been addressed by utilizing pre-trained deep learning models with domain-specific information [10,21,22].
Sketch to digital image matching refers to the problem of matching a sketch image (often drawn by hand or created by a software) against a gallery of digital images (captured via an acquisition device such as a digital camera). Automated sketch to digital image matching has applicability in several day to day tasks such as similar object image retrieval, forensic sketch matching in law enforcement scenarios, or profile linking using caricature face images on social media. As opposed to the digital images, sketch images are generally edge-drawings containing limited (or no) textural or colour based information. Further, there is no single technique for sketch generation, which often results in varying artistic or software styles, along with the interpretation bias of the individual creating the sketch. Beyond the variations observed across the two domains (sketch and digital image), automated sketch to digital image matching is further marred by the challenge of limited training data and wide intra-class variability. In order to address the above problems, this research proposes a novel Discriminative Shared Transform Learning (DSTL) algorithm for sketch to digital image matching. DSTL learns a shared transform for data belonging to the two domains, while modeling the class variations, resulting in discriminative feature learning. Two models have been presented under the proposed DSTL algorithm: (i) Contractive Model (C-Model) and (ii) Divergent Model (D-Model), which have been formulated with different supervision constraints. Experimental analysis on seven datasets for three case studies of sketch to digital image matching demonstrate the efficacy of the proposed approach, highlighting the importance of each component, its input-agnostic behavior, and improved matching performance.
A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future Directions
2024, IEEE Access
CBTnet: A Capsule Network Classifier Based on B-Tree Structure for Hand Sketch Drawings
2023, SSRN

View all citing articles on Scopus

^☆: Conflict of interest: We wish to confirm that there are no known conflicts of interest associated with thispublication and there has been no significant financial support for this work that could haveinfluenced its outcome.

View full text

A Hybrid convolutional neural network for sketch recognition☆

Highlights

Abstract

Introduction

Section snippets

Related work

The proposed method

Sketch classification experiment

Sketch-based image retrieval experiment

Conclusion

Acknowledgements

Markov random fields for sketch based video retrieval

ACM Conference on International Conference on Multimedia Retrieval

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

Matching local self-similarities across images and videos

Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on

Histograms of oriented gradients for human detection

IEEE Computer Society Conference on Computer Vision Pattern Recognition

A descriptor for large scale image retrieval based on sketched feature lines

Eurographics Symposium on Sketch-Based Interfaces and Modeling

Sketch classification and classification-driven analysis using fisher vectors

Text categorization with support vector machines: learning with many relevant features

European Conference on Machine Learning

A comparison of event models for naive Bayes text classification

AAAI-98 Workshop on Learning for Text Categorization

Imagenet classification with deep convolutional neural networks

International Conference on Neural Information Processing Systems

Gradient-based learning applied to document recognition

Proc. IEEE

How do humans sketch objects?

ACM Trans. Graph.

The sketchy database: learning to retrieve badly drawn bunnies

ACM Trans. Graph.

Deepsketch: deep convolutional neural networks for sketch recognition and similarity search

International Workshop on Content-Based Multimedia Indexing

Sketch recognition by ensemble matching of structured features

British Machine Vision Conference

Free-hand sketch recognition by multi-kernel feature learning

Comput. Vis. Image Understanding

Sketchnet: sketch classification with web images

Computer Vision and Pattern Recognition

Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis

IEEE Trans. Knowl. Data Eng.

Semantic pooling for complex event analysis in untrimmed videos

IEEE Trans. Pattern Anal. Mach. Intell.

Sketch-based image retrieval: benchmark and bag-of-features descriptors

IEEE Trans. Vis. Comput. Graph.

Gradient field descriptor for sketch based retrieval and localization

IEEE International Conference on Image Processing

Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval

IEEE Trans. Neural Netw. Learn. Syst.

An Improved Histogram of Edge Local Orientations for Sketch-Based Image Retrieval

Sketch based image retrieval using a soft computation of the histogram of edge local orientations (s-helo)

IEEE International Conference on Image Processing

Shape matching and object recognition using shape contexts

IEEE International Conference on Computer Science and Information Technology

Edgel index for large-scale sketch-based image search

Computer Vision and Pattern Recognition

Making better use of edges via perceptual grouping

Computer Vision and Pattern Recognition

Sketch-based image retrieval via shape words

ACM on International Conference on Multimedia Retrieval

Discovering discriminative patches for free-hand sketch analysis

Multimedia Syst.

Sketch-based image retrieval via siamese convolutional neural network

IEEE International Conference on Image Processing