Skip to main content
Log in

Three-way enhanced part-aware network for fine-grained sketch-based image retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Sketch-based image retrieval is of import practical significance in today’s world populated by smart touch screen devices. Fine-grained sketch-based image retrieval (FG-SBIR) is particularly challenging and uses characteristic of free-hand sketches to retrieve natural photos at the instance level. From outline and semantic perspectives, a free-hand sketch may have many natural photos corresponding to it, we call the relationship “one-to-many”, which means that the effectiveness of FG-SBIR mainly depends on the quality of fine-grained information extracted. Existing deep convolutional neural network (DCNN) models for FG-SBIR commonly use coarse or first-order attention modules to focus on specific local regions, yet cannot capture high-order or complex information and the subtle differences between sketch–photo pairs. It is widely known that the features learned from higher layers of the network are more abstract and of a higher semantic level compared to those learned from the lower layers, but lose some important fine-grained information. To address these limitations, this paper proposes a three-way enhanced part-aware network (EPAN), in which a mixed high-order attention module is added after the middle-level feature space to generate a variety of high-order attention maps and capture rich features contained in the middle convolutional layer. An enhanced part-aware module is proposed to capture useful part cues and enhance the semantic consistency of local regions. This allows for learning more discriminative cross-domain feature representations. A larger number of experiments on several popular datasets demonstrate that our model is superior to state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Yangbo FGQYX (2007) Review on technology of text-based image retrieval. Sci Mosaic 3

  2. Bhave AM, Wanjari M, Sawarkar G (2014) Iretrieval: Image retrieval based on color feature and texture feature. Int J Adv Res Comput Sci 5(6)

  3. Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  4. Song J, Yu Q, Song YZ, Xiang T, Hospedales TM (2017) Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 5551–5560

  5. Yu Q, Liu F, Song YZ, Xiang T, Hospedales T, Loy CC (2016) Sketch me that shoe. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 799–807

  6. Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2862–2871

  7. Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph (TOG) 35(4):1–12

    Article  Google Scholar 

  8. Qi Y, Song YZ, Zhang H, Liu J (2016) Sketch-based image retrieval via Siamese convolutional neural network. In: 2016 IEEE International conference on image processing (ICIP), pp 2460–2464

  9. Li K, Pang K, Song YZ, Hospedales T, Zhang H, Hu Y (2016) Fine-grained sketch-based image retrieval: The role of part-aware attributes. In: 2016 IEEE Winter conference on app lications of computer vision (WACV), pp 1–9

  10. Saavedra JM, Bustos B (2014) Sketch-based image retrieval using keyshapes. Multimed Tools Appl 73(3):2033–2062

    Article  Google Scholar 

  11. Eitz M, Hildebrand K, Boubekeur T, Alexa M (2010) Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Trans Vis Comput Graph 17(11):1624–1636

    Article  Google Scholar 

  12. Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806

    Article  Google Scholar 

  13. Cao Y, Wang C, Zhang L, Zhang L (2011) Edgel index for large-scale sketch-based image search. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp 761–768

  14. Eitz M, Hildebrand K, Boubekeur T, Alexa M (2010) An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Comput Graph 34(5):482–498

    Article  Google Scholar 

  15. Zhou R, Chen L, Zhang L (2012) Sketch-based image retrieval on a large scale database. In: Proceedings of the 20th ACM international conference on Multimedia, pp 973–976

  16. Lei J, Song Y, Peng B, Ma Z, Shao L, Song YZ (2019) Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Trans Circ Syst Video Technol 30(9):3226–3237

    Article  Google Scholar 

  17. Parui S, Mittal A (2014) Similarity-invariant sketch-based image retrieval in large databases. In: European conference on computer vision, springer, pp 398–414

  18. Li K, Pang K, Song YZ, Hospedales T, Xiang T, Zhang H (2017) Synergistic instance-level subspace alignment for fine-grained sketch-based image retrieval. IEEE Trans Image Process 26(12):5908–5921

    Article  MathSciNet  Google Scholar 

  19. Yu Q, Yang Y, Liu F, Song YZ, Xiang T, Hospedales TM (2017) Sketch-a-net: a deep neural network that beats humans. Int J Comput Vis 122(3):411–425

    Article  MathSciNet  Google Scholar 

  20. Zhang H, Zhang C, Wu M (2017) Sketch-based cross-domain image retrieval via heterogeneous network. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4

  21. Bui T, Ribeiro L, Ponti M, Collomosse J (2018) Sketching out the details:Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:77–87

    Article  Google Scholar 

  22. Radenovic F, Tolias G, Chum O (2018) Deep shape matching. In: Proceedings of the European conference on computer vision (eccv), pp 751–767

  23. Song J, Pang K, Song YZ, Xiang T, Hospedales T (2018) Learning to sketch with shortcut cycle consistency. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 801–810

  24. Pang K, Song YZ, Xiang T, Hospedales TM (2017) Cross-domain generative learning for fine-grained sketch-based image retrieval. In: BMVC, pp 1–12

  25. Ha D, Eck D (2017) A neural representation of sketch drawings. arXiv:170403477

  26. Chen W, Hays J (2018) Sketchygan: Towards diverse and realistic sketch to image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9416–9425

  27. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. arXiv:14062661

  28. Yu Q, Chang X, Song YZ, Xiang T, Hospedales TM (2017) The devil is in the middle: Exploiting mid-level representations for cross-domain instance matching. arXiv:171108106

  29. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:170307737

  30. Chen B, Deng W, Hu J (2019) Mixed high-order attention network for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 371–381

  31. Li Y, Hospedales T, Song YZ, Gong S (2014) Fine-grained sketch-based image retrieval by matching deformable part models. In: The british machine vision conference (BMVC)

  32. Lin H, Fu Y, Lu P, Gong S, Xue X, Jiang YG (2019) TC-Net for iSBIR: Triplet classification network for instance-level sketch based image retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1676–1684

  33. Pang K, Li D, Song J, Song YZ, Xiang T, Hospedales TM (2018) Deep factorised inverse-sketching. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 36–52

  34. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805

  35. Ribeiro LSF, Bui T, Collomosse J, Ponti M (2020) Sketchformer:transformer-based representation for sketched structure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14153– 14162

  36. Lin H, Fu Y, Xue X, Jiang YG (2020) Sketch-bert: Learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6758–6767

  37. Pang K, Li K, Yang Y, Zhang H, Hospedales T, Xiang T, Song YZ (2019) Generalising fine-grained sketch-based image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 677–686

  38. Bhunia AK, Yang Y, Hospedales T, Xiang T, Song YZ (2020) Sketch less for more:On-the-fly fine-grained sketch-based image retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9779–9788

  39. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  40. Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 2285–2294

  41. Li D, Chen X, Zhang Z, body Huang K (2017) Latent parts for person re-identification Learning deep context-aware features over. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 384–393

  42. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  43. Yu R, Dou Z, Bai S, Zhang Z, Xu Y, Bai X (2018) Hard-aware point-to-set deep metric for person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 188–204

  44. Chen W, Chen X, Zhang J, Huang K (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 403–412

  45. Shi H, Yang Y, Zhu X, Liao S, Lei Z, Zheng W, Li SZ (2016) Embedding deep metric for personre-identification: A study against large variations. In: European conference on computer vision. Springer pp 732–748

  46. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with app lication to face verification. In: 2005 IEEE Computer society conference on computer vision and pattern recognition(CVPR), vol 1, pp 539–546

  47. Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European conference on computer vision (ECCV), pp 480–496

  48. Radenović F, Tolias G, Chum O (2018) Fine-tuning CNN image Image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell 41(7):1655–1668

    Article  Google Scholar 

  49. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:170603762

  50. Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S, Gu J (2019) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimed 22(10):2597–2609

    Article  Google Scholar 

  51. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR 37:448–456

  52. Zitnick CL, Dollár P (2014) Edge boxes: Locating object proposals from edges. In: European conference on computer vision, Springer, pp 391–405

  53. Zhong Z, Zheng L, Kang G, Li S, Yang Y. (2020) Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34 pp 13001–13008

  54. Sangkloy P, Burnell N, Ham C, Hays J. (2016) The sketchy database:Learning to retrieve baddly drawn bunnies. ACM Transactions on Graphics (proceedings of SIGGRAPH)

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (No. 61772032) and the National Key R&D Project (SQ2018YFC080102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shoubiao Tan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Tang, J. & Tan, S. Three-way enhanced part-aware network for fine-grained sketch-based image retrieval. Appl Intell 52, 10901–10916 (2022). https://doi.org/10.1007/s10489-021-02960-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02960-9

Keywords

Navigation