Abstract
Humans have an excellent ability to recognize freehand sketch drawings despite their abstract and sparse structures. Understanding freehand sketches with automated methods is a challenging task due to the diversity and abstract structures of these sketches. In this paper, we propose an efficient freehand sketch recognition scheme, which is based on the feature-level fusion of Convolutional Neural Networks (CNNs) in the transfer learning context. Specifically, we analyse different layer performances of distinct ImageNet pretrained CNNs and combine best performing layer features within the CNN-SVM pipeline for recognition. We also employ Principal Component Analysis (PCA) to reduce the fused deep feature dimensions to ensure the efficiency of the recognition application on the limited-capacity devices. We perform evaluations on two real sketch benchmark datasets, namely the Sketchy and the TU-Berlin to show the effectiveness of the proposed scheme. Our experimental results show that, the feature-level fusion scheme with the PCA achieves a recognition accuracy of 97.91% and 72.5% on the Sketchy and TU-Berlin datasets, respectively. This result is promising when compared with the human recognition accuracy of 73.1% on the TU-Berlin dataset. We also develop a sketch recognition application for smart devices to demonstrate the proposed scheme.







Similar content being viewed by others
References
Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascades
Aihkisalo T, Paaso T (2012) Latencies of service invocation and processing of the REST and SOAP Web service interfaces. In: 2012 IEEE 8th world congress on services. Honolulu, pp 100–107
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv:1701.07875
Boyaci E, Sert M (2017) Feature-level fusion of deep convolutional neural networks for sketch recognition on smartphones. In: Proceedings of IEEE international conference on consumer electronics (ICCE2017), January 8-10, 2017, Las Vegas, Nevada, USA, pp 485–486
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of British machine vision conference (BMVC)
Chen W, Hays J (2018) SketchyGAN: towards diverse and realistic sketch to image synthesis. arXiv:1801.02753
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems (NIPS’16). Curran Associates Inc., pp 2180–2188
Creswell A, Bharath AA (2016) Adversarial training for sketch retrieval. In: Computer vision - ECCV 2016 workshops, lecture notes in computer science, vol 9913. Springer, Cham, pp 798–809
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Comput soc conf comput vis pattern recognit (CVPR), pp 886–893
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. IEEE Computer Vision and Pattern Recognition (CVPR)
Denton EL, Chintala S, Fergus T et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS
Eitz M, Hildebrand K, Boubekeur T, Alexa M (2011) Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans Visual Comput Graph 17(11):1624–1636
Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph 31(4):1–10
Ergun H, Akyuz YC, Sert M, Liu J (2016) Early and late level fusion of deep convolutional neural networks for visual concept recognition. Int J Semant Comput 10 (03):379–397
Ergun H, Sert M (2016) Fusing deep convolutional networks for large scale visual concept classification. In: IEEE international conference on multimedia big data (BigMM2016)
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S (2014) Generative adversarial nets. In: Advances in neural information processing systems 27. Curran Associates, Inc., pp 2672–2680
Guo J, Gould S (2015) Deep CNN ensemble with data augmentation for object detection. arXiv:1506.07224
Guo J, Wang C, Roman-Rangel E, Chao H, Rui Y (2016) Building hierarchical representations for oracle character and sketch recognition. IEEE Transactions on Image Processing (TIP)
Isola P, Zhu J, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 5967–5976
Jahani-Fariman H, Kavakli M, Boyali A (2018) MATRACK: block sparse Bayesian learning for a sketch recognition approach. Multimed Tools Appl 77 (2):1997–2012
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Jolliffe L (1986) Principal component analysis. Springer, New York
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, pp 1725–1732
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 1097–1105
LeCun YA, Bottou L, Müller K R, Orr GB (2012) Efficient BackProp. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Lecture notes in computer science, vol 7700, pp 9–48
Li Y, Hospedales TM, Song YZ, Gong S (2015) Free-hand sketch recognition by multi-kernel feature learning. Comput Vis Image Underst 137(C):1–11
Li Y, Song Y, Gong S (2017) Sketch recognition by ensemble matching of structured features. In: BMVC
Liu K, Sun Z, Song M, et al. (2017) Iterative samples labeling for sketch recognition. Multimed Tools Appl 76(10):12819–12852
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of features image classification. In: Computer vision - ECCV. Springer, New York, pp 490–503
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Adv. Large margin classifiers. MIT Press, pp 61–74
Qian Y, Yongxin Y, Yi-Zhe S, Xiang T, Hospedales TM (2015) Sketch-a-net that beats humans. In: Proceedings of the British machine vision conference 2015, (BMVC), pp 1–12
Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition workshops (CVPRW ’14). IEEE Computer Society, Washington, DC, pp 512–519
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. In: NIPS
Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph 35(4):119:1–119:12
Sarvadevabhatla RK, Babu RV (2015) Freehand sketch recognition using deep features. arXiv:http://arXiv.org/abs/1502.00254
Schneider RG, Tuytelaars T (2014) Sketch classification and classification-driven analysis using fisher vectors. ACM Trans Graph 33(6):1–9
Seddati O, Dupont S, Mahmoudi S (2017) DeepSketch 3 analyzing deep neural networks features for better sketch recognition and sketch-based image retrieval. Multimed Tools Appl 76(21):22333–22359
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:http://arXiv.org/abs/1409.1556
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402
Srinivas S, Ravi Sarvadevabhatla K, Mopuri KR, Prabhu N, Kruthiventi S, Babu RV (2016) A taxonomy of deep convolutional neural nets for computer vision. Front Robot AI, 2(36)
Szegedy C, Liu W, Yangqing J, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
Tseng KY, Lin YL, Chen YH, Hsu WH (2012) Sketch-based image retrieval on mobile devices using compact hash bits. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 913–916
Wagh K, Thool R (2012) A comparative study of SOAP vs REST web services provisioning techniques for mobile host. J Inf Eng Appl 2(5):12–16. ISSN 2224-5782 (print), ISSN 2225-0506 (online)
Wang L, Sindagi V, Patel V (2018) High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 13th IEEE international conference on automatic face & gesture recognition (FG 2018). Xi’an, pp 83–90
Wu S, Yang H, Zheng S, et al. (2017) Motion sketch based crowd video retrieval. Multimed Tools Appl 76(19):20167–20195
Xiao C, Wang C, Zhang L (2015) PPTLens: create digital objects with sketch images. ACM Conference on Multimedia
Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: unsupervised dual learning for image-to-image translation. In: 2017 IEEE international conference on computer vision (ICCV). Venice, pp 2868–2876
Yoo D, Park S, Lee J-Y, Kweon IS (2014) Fisher kernel for deep neural activations. arXiv:http://arXiv.org/abs/1412.1628
Zhou T, Krähenbühl P, Aubry M, Huang Q, Efros AA (2016) Learning dense correspondence via 3D-guided cycle consistency. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, pp 117–126
Zhu J-Y, Krähenbühl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: ECCV
Zhu J-Y, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 2242–2251
Acknowledgments
The authors thank Berkay Selbes for running the feature extraction time experiments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sert, M., Boyacı, E. Sketch recognition using transfer learning. Multimed Tools Appl 78, 17095–17112 (2019). https://doi.org/10.1007/s11042-018-7067-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7067-1