Skip to main content
Log in

SBIR-BYOL: a self-supervised sketch-based image retrieval model

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Sketch-based image retrieval is demanding interest in the computer vision community due to its relevance in the visual perception system and its potential application in a wide diversity of industries. In the literature, we observe significant advances when the models are evaluated in public datasets. However, when assessed in real environments, the performance drops drastically. The big problem is that the SOTA SBIR models follow a supervised regimen, strongly depending on a considerable amount of labeled sketch-photo pairs, which is unfeasible in real contexts. Therefore, we propose SBIR-BYOL, an extension of the well-known BYOL, to work in a bimodal scenario for sketch-based image retrieval. To this end, we also propose a two-stage self-supervised training methodology, exploiting existing sketch-photo pairs and contour-photo pairs generated from photographs of a target catalog. We demonstrate the benefits of our model for the eCommerce environments, where searching is a critical component. Here, our self-supervised SBIR model shows an increase of over \(60\%\) of mAP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Hubel DH, Wiesel TN (2004) Brain and Visual Perception: The Story of a 25Year Collaboration, Illustrated. Oxford University Press, London

    Book  Google Scholar 

  2. Walther DB, Chai B, Caddigan E, Beck DM, Fei-Fei L (2011) Simple line drawings suffice for functional mri decoding of natural scene categories. Proceed Natl Acad Sci 108(23):9661–9666

    Article  Google Scholar 

  3. Yu Q, Yang Y, Liu F, Song Y-Z, Xiang T, Hospedales TM (2017) Sketch-a-net: A deep neural network that beats humans. Int J Comput Vis 122:3

    Article  MathSciNet  Google Scholar 

  4. Forbus K, Usher J, Lovett A, Lockwood K, Wetzel J (2011) Cogsketch: sketch understanding for cognitive science research and for education. Topi Cognit Sci 3(4):648–666

    Article  Google Scholar 

  5. Mukherjee K, Hawkins RXD, Fan JW (2019) Communicating semantic part information in drawings. In: Goel AK, Seifert CM, Freksa C (eds.) Proceedings of the 41th Annual Meeting of the Cognitive Science Society, CogSci 2019: Creativity + Cognition + Computation, Montreal, Canada. 24-27: 2413–2419

  6. Kearney KS, Hyle AE (2004) Drawing out emotions: the use of participant-produced drawings in qualitative inquiry. Qualitat Res 4(3):361–382

    Article  Google Scholar 

  7. Torres P, Saavedra JM (2021) Compact and effective representations for sketch-based image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, Virtual, June 19-25, 2021, pp. 2115–2123. IEEE

  8. Yu Q, Song J, Song Y-Z, Xiang T, Hospedales TM (2021) Fine-grained instance-level sketch-based image retrieval. Int. J. Comput. Vis 129(2):484–500

    Article  Google Scholar 

  9. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31(4):44–14410

    Article  Google Scholar 

  10. Yu Q, Yang Y, Liu F, Song Y-Z, Xiang T, Hospedales TM (2017) Sketch-a-net: A deep neural network that beats humans. Int J Comput Vis 122(3):411–425

    Article  MathSciNet  Google Scholar 

  11. Xu P, Huang Y, Yuan T, Pang K, Song Y-Z, Xiang T, Hospedales TM, Ma Z, Guo J (2018) Sketchmate: Deep hashing for million-scale human sketch retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  12. Xu P, Hospedales TM, Yin Q, Song Y-Z, Xiang T, Wang L (2022) Deep learning for free-hand sketch: A survey. IEEE Transact Patt Analy Mach Intell 1:109

    Google Scholar 

  13. Tripathi A, Dani RR, Mishra A, Chakraborty A (2020) Sketch-guided object localization in natural images. In: Vedaldi, A, Bischof, H, Brox, T, Frahm, J (eds) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VI. Lecture Notes in Computer Science vol 12351 pp 532–547

  14. Bui T, Ribeiro L, Ponti M, Collomosse J (2018) Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:109

    Article  Google Scholar 

  15. Fuentes A, Saavedra JM (2021) Sketch-qnet: a quadruplet convnet for color sketch-based image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, Virtual, June 19-25, 2021, pp. 2134–2141. IEEE

  16. Murrugarra-Llerena N, Kovashka A (2018) Image retrieval with mixed initiative and multimodal feedback. Brit Mach Vis Confer BMVC 207:103–204

    Google Scholar 

  17. Murrugarra-Llerena N, Kovashka A (2021) Image retrieval with mixed initiative and multimodal feedback. Computer Vision and Image Understanding 207:103204

    Article  Google Scholar 

  18. Collomosse J, McNeill G, Qian Y (2009) Storyboard sketches for content based video retrieval. pp. 245–252

  19. Chen W, Hays J (2018) Sketchygan: towards diverse and realistic sketch to image synthesis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9416–9425

  20. Sangkloy P, Lu J, Fang C, Yu F, Hays J (2017) Scribbler: Controlling deep image synthesis with sketch and color. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6836–6845

  21. Saavedra JM, Barrios JM (2015) Sketch based image retrieval using learned keyshapes (LKS). In: Proceedings of the British Machine Vision Conference 2015, BMVC 2015. Swansea, UK, September 7-10, 2015, pp. 164–116411

  22. Hu R, Collomosse J (2013) A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Comput Vis Image Understand 117(7):790–806

    Article  Google Scholar 

  23. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31(4):44–14410

    Article  Google Scholar 

  24. Hoffmann DL, Standish CD, García-Diez M, Pettitt PB, Milton JA, Zilhão J, Alcolea-González JJ, Cantalejo-Duarte P, Collado H, de Balbín R, Lorblanchet M, Ramos-Muñoz J, Weniger G-C, Pike AWG (2018) U-th dating of carbonate crusts reveals neandertal origin of iberian cave art. Science 359(6378):912–915

    Article  Google Scholar 

  25. Li Y, Xu W (2022) Using cyclegan to achieve the sketch recognition process of sketch-based modeling. In: Yuan, PF, Chai, H, Yan, C, Leach, N (eds) Proceedings of the 2021 DigitalFUTURES. Springer: London pp. 26–34

  26. de Andrade V, Freire S, Baptista M, Shwartz Y (2022) Drawing as a space for social-cognitive interaction. Educat Sci 12:45

    Article  Google Scholar 

  27. Fernandes MA, Wammes JD, Meade ME (2018) The surprisingly powerful influence of drawing on memory. Curr Direct Psycholog Sci 27(5):302–308

    Article  Google Scholar 

  28. Ha D, Eck D (2018) A neural representation of sketch drawings. In: International Conference on Learning Representations. https://openreview.net/forum?id=Hy6GHpkCW

  29. Xu P, Huang Y, Yuan T, Xiang T, Hospedales TM, Song Y-Z, Wang L (2021) On learning semantic representations for large-scale abstract sketches. IEEE Transact Circuits Syst Video Technol 31(9):3366–3379

    Article  Google Scholar 

  30. Morales J, Murrugarra-Llerena N, Saavedra JM (2022) Leveraging unlabeled data for sketch based understanding. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR-SketchDL Workshop. IEEE

  31. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2242–2251

  32. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976

  33. Saavedra JM (2014) Sketch based image retrieval using a soft computation of the histogram of edge local orientations (s-helo). In: 2014 IEEE International Conference on Image Processing (ICIP). pp. 2998–3002

  34. Saavedra JM (2017) Rst-shelo: sketch-based image retrieval using sketch tokens and square root normalization. Multimed Tools Appl 76(1):931–951

    Article  Google Scholar 

  35. Canny J (1986) A computational approach to edge detection. IEEE Transact Patt Analy Mach Intell PAMI 8(6):679–698

    Article  Google Scholar 

  36. Lim JJ, Zitnick CL, Dollár P (2013) Sketch tokens: A learned mid-level representation for contour and object detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3158–3165

  37. Saavedra JM, Bustos B (2013) Sketch-based image retrieval using keyshapes. Multimed Tools Appl 73(3):2033–2062

    Article  Google Scholar 

  38. Yu Q, Liu F, Song Y, Xiang T, Hospedales TM, Loy CC (2016) Sketch me that shoe. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 799–807

  39. Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (proceedings of SIGGRAPH)

  40. McInnes L, Healy J, Saul N, Großberger L (2018) UMAP: uniform manifold approximation and projection. J Open Sour Soft 3(29):861

    Article  Google Scholar 

  41. Grill J-B, Strub F, Altché F, Tallec C, Richemond P, Buchatskaya E, Doersch C, Avila Pires B, Guo Z, Gheshlaghi Azar M, Piot B, kavukcuoglu k, Munos R, Valko M, (2020) Bootstrap your own latent - a new approach to self-supervised learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, vol 33. Curran Associates Inc, London, pp 21271–21284

  42. Su Z, Liu W, Yu Z, Hu D, Liao Q, Tian Q, Pietikäinen M, Liu L (2021) Pixel difference networks for efficient edge detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5117–5127

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose M. Saavedra.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.The authors have no competing interests to declare that are relevant to the content of this article.All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saavedra, J.M., Morales, J. & Murrugarra-Llerena, N. SBIR-BYOL: a self-supervised sketch-based image retrieval model. Neural Comput & Applic 35, 5395–5408 (2023). https://doi.org/10.1007/s00521-022-07978-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07978-9

Keywords

Navigation