Abstract
Sketch-based image retrieval is of import practical significance in today’s world populated by smart touch screen devices. Fine-grained sketch-based image retrieval (FG-SBIR) is particularly challenging and uses characteristic of free-hand sketches to retrieve natural photos at the instance level. From outline and semantic perspectives, a free-hand sketch may have many natural photos corresponding to it, we call the relationship “one-to-many”, which means that the effectiveness of FG-SBIR mainly depends on the quality of fine-grained information extracted. Existing deep convolutional neural network (DCNN) models for FG-SBIR commonly use coarse or first-order attention modules to focus on specific local regions, yet cannot capture high-order or complex information and the subtle differences between sketch–photo pairs. It is widely known that the features learned from higher layers of the network are more abstract and of a higher semantic level compared to those learned from the lower layers, but lose some important fine-grained information. To address these limitations, this paper proposes a three-way enhanced part-aware network (EPAN), in which a mixed high-order attention module is added after the middle-level feature space to generate a variety of high-order attention maps and capture rich features contained in the middle convolutional layer. An enhanced part-aware module is proposed to capture useful part cues and enhance the semantic consistency of local regions. This allows for learning more discriminative cross-domain feature representations. A larger number of experiments on several popular datasets demonstrate that our model is superior to state-of-the-art approaches.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yangbo FGQYX (2007) Review on technology of text-based image retrieval. Sci Mosaic 3
Bhave AM, Wanjari M, Sawarkar G (2014) Iretrieval: Image retrieval based on color feature and texture feature. Int J Adv Res Comput Sci 5(6)
Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Song J, Yu Q, Song YZ, Xiang T, Hospedales TM (2017) Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 5551–5560
Yu Q, Liu F, Song YZ, Xiang T, Hospedales T, Loy CC (2016) Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 799–807
Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2862–2871
Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph (TOG) 35(4):1–12
Qi Y, Song YZ, Zhang H, Liu J (2016) Sketch-based image retrieval via Siamese convolutional neural network. In: 2016 IEEE International conference on image processing (ICIP), pp 2460–2464
Li K, Pang K, Song YZ, Hospedales T, Zhang H, Hu Y (2016) Fine-grained sketch-based image retrieval: The role of part-aware attributes. In: 2016 IEEE Winter conference on app lications of computer vision (WACV), pp 1–9
Saavedra JM, Bustos B (2014) Sketch-based image retrieval using keyshapes. Multimed Tools Appl 73(3):2033–2062
Eitz M, Hildebrand K, Boubekeur T, Alexa M (2010) Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans Vis Comput Graph 17(11):1624–1636
Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806
Cao Y, Wang C, Zhang L, Zhang L (2011) Edgel index for large-scale sketch-based image search. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp 761–768
Eitz M, Hildebrand K, Boubekeur T, Alexa M (2010) An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput Graph 34(5):482–498
Zhou R, Chen L, Zhang L (2012) Sketch-based image retrieval on a large scale database. In: Proceedings of the 20th ACM international conference on Multimedia, pp 973–976
Lei J, Song Y, Peng B, Ma Z, Shao L, Song YZ (2019) Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Trans Circ Syst Video Technol 30(9):3226–3237
Parui S, Mittal A (2014) Similarity-invariant sketch-based image retrieval in large databases. In: European conference on computer vision, springer, pp 398–414
Li K, Pang K, Song YZ, Hospedales T, Xiang T, Zhang H (2017) Synergistic instance-level subspace alignment for fine-grained sketch-based image retrieval. IEEE Trans Image Process 26(12):5908–5921
Yu Q, Yang Y, Liu F, Song YZ, Xiang T, Hospedales TM (2017) Sketch-a-net: a deep neural network that beats humans. Int J Comput Vis 122(3):411–425
Zhang H, Zhang C, Wu M (2017) Sketch-based cross-domain image retrieval via heterogeneous network. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4
Bui T, Ribeiro L, Ponti M, Collomosse J (2018) Sketching out the details:Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:77–87
Radenovic F, Tolias G, Chum O (2018) Deep shape matching. In: Proceedings of the European conference on computer vision (eccv), pp 751–767
Song J, Pang K, Song YZ, Xiang T, Hospedales T (2018) Learning to sketch with shortcut cycle consistency. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 801–810
Pang K, Song YZ, Xiang T, Hospedales TM (2017) Cross-domain generative learning for fine-grained sketch-based image retrieval. In: BMVC, pp 1–12
Ha D, Eck D (2017) A neural representation of sketch drawings. arXiv:170403477
Chen W, Hays J (2018) Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9416–9425
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:14062661
Yu Q, Chang X, Song YZ, Xiang T, Hospedales TM (2017) The devil is in the middle: Exploiting mid-level representations for cross-domain instance matching. arXiv:171108106
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:170307737
Chen B, Deng W, Hu J (2019) Mixed high-order attention network for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 371–381
Li Y, Hospedales T, Song YZ, Gong S (2014) Fine-grained sketch-based image retrieval by matching deformable part models. In: The british machine vision conference (BMVC)
Lin H, Fu Y, Lu P, Gong S, Xue X, Jiang YG (2019) TC-Net for iSBIR: Triplet classification network for instance-level sketch based image retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1676–1684
Pang K, Li D, Song J, Song YZ, Xiang T, Hospedales TM (2018) Deep factorised inverse-sketching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 36–52
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805
Ribeiro LSF, Bui T, Collomosse J, Ponti M (2020) Sketchformer:transformer-based representation for sketched structure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14153– 14162
Lin H, Fu Y, Xue X, Jiang YG (2020) Sketch-bert: Learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6758–6767
Pang K, Li K, Yang Y, Zhang H, Hospedales T, Xiang T, Song YZ (2019) Generalising fine-grained sketch-based image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 677–686
Bhunia AK, Yang Y, Hospedales T, Xiang T, Song YZ (2020) Sketch less for more:On-the-fly fine-grained sketch-based image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9779–9788
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 2285–2294
Li D, Chen X, Zhang Z, body Huang K (2017) Latent parts for person re-identification Learning deep context-aware features over. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 384–393
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Yu R, Dou Z, Bai S, Zhang Z, Xu Y, Bai X (2018) Hard-aware point-to-set deep metric for person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 188–204
Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 403–412
Shi H, Yang Y, Zhu X, Liao S, Lei Z, Zheng W, Li SZ (2016) Embedding deep metric for personre-identification: A study against large variations. In: European conference on computer vision. Springer pp 732–748
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with app lication to face verification. In: 2005 IEEE Computer society conference on computer vision and pattern recognition(CVPR), vol 1, pp 539–546
Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp 480–496
Radenović F, Tolias G, Chum O (2018) Fine-tuning CNN image Image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell 41(7):1655–1668
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:170603762
Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimed 22(10):2597–2609
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR 37:448–456
Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision, Springer, pp 391–405
Zhong Z, Zheng L, Kang G, Li S, Yang Y. (2020) Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34 pp 13001–13008
Sangkloy P, Burnell N, Ham C, Hays J. (2016) The sketchy database:Learning to retrieve baddly drawn bunnies. ACM Transactions on Graphics (proceedings of SIGGRAPH)
Acknowledgements
This study was supported by the National Natural Science Foundation of China (No. 61772032) and the National Key R&D Project (SQ2018YFC080102).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Tang, J. & Tan, S. Three-way enhanced part-aware network for fine-grained sketch-based image retrieval. Appl Intell 52, 10901–10916 (2022). https://doi.org/10.1007/s10489-021-02960-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02960-9