Skip to main content
Log in

Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) addresses the problem of retrieving a photo given a query sketch whose category is unseen in the training stage. ZS-SBIR inherits the main challenges of multiple computer vision tasks, including SBIR, zero-shot learning and domain adaptation. The domain gap between sketch and photo requires the model to extract meaningful semantic information. To eliminate the domain gap, current methods mainly target introducing additional word embeddings or designing synthetic-based sub-networks. From another perspective, we focus on feature extraction and propose a simple and plug-and-play feature fusion module to enrich and dig into the semantic information, where an energy function is introduced to guide the feature fusion so that we obtain features with better retrieve performance. The proposed method achieves state-of-the-art results on two widely used ZS-SBIR datasets, even surpassing some methods that use additional word embeddings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Please refer to SimAM [19] for more details and derivation of Equation (1).

  2. The definition of feature levels follows the notation used in timm library.

References

  1. Kapoor R, Sharma D, Gulati T (2021) State of the art content based image retrieval techniques using deep learning: a survey. Multimed Tools Appl 80(19):29561–29583

    Article  Google Scholar 

  2. Yelamarthi SK, Reddy SK, Mishra A, Mittal A (2018) A zero-shot framework for sketch based image retrieval. In: European conference on computer vision, pp 300–317

  3. Dey S, Riba P, Dutta A, Llados J, Song Y-Z (2019) Doodle to search: practical zero-shot sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 2179–2188

  4. Liu Q, Xie L, Wang H, Yuille AL (2019) Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In: International conference on computer vision, pp 3662–3671

  5. Zhang Z, Zhang Y, Feng R, Zhang T, Fan W (2020) Zero-shot sketch-based image retrieval via graph convolution network. In: AAAI conference on artificial intelligence, vol 34, pp 12943–12950

  6. Zhu J, Xu X, Shen F, Lee RK-W, Wang Z, Shen HT (2020) OCEAN: a dual learning approach for generalized zero-shot sketch-based image retrieval. In: IEEE international conference on multimedia & Expo, pp 1–6

  7. Chaudhuri U, Banerjee B, Bhattacharya A, Datcu M (2020) CrossATNet-a novel cross-attention based framework for sketch-based image retrieval. Image Vis Comput 104:104003

    Article  Google Scholar 

  8. Deng C, Xu X, Wang H, Yang M, Tao D (2020) Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans Image Process 29:8892–8902

    Article  MATH  Google Scholar 

  9. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  10. Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: fast free-hand sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 2862–2871

  11. Shen Y, Liu L, Shen F, Shao L (2018) Zero-shot sketch-image hashing. In: IEEE conference on computer vision and pattern recognition, pp 3598–3607

  12. Dutta T, Biswas S (2019) Style-guided zero-shot sketch-based image retrieval. In: British machine vision conference, p 9

  13. Dutta A, Akata Z (2019) Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 5089–5098

  14. Wang W, Shi Y, Chen S, Peng Q, Zheng F, You X (2021) Norm-guided adaptive visual embedding for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 1106–1112

  15. Tursun O, Denman S, Sridharan S, Goan E, Fookes C (2022) An efficient framework for zero-shot sketch-based image retrieval. Pattern Recognit 21:108528

    Article  Google Scholar 

  16. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Zhang Z, Zhang X, Peng C, Xue X, Sun J (2018) ExFuse: enhancing feature fusion for semantic segmentation. In: European conference on computer vision, pp 269–284

  19. Yang L, Zhang R.-Y, Li L, Xie X (2021) SimAM: A simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning, pp 11863–11874

  20. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition, pp 7132–7141

  21. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: European conference on computer vision, pp 3–19

  22. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 11534–11542

  23. Zhai A, Wu H-Y (2019) Classification is a strong baseline for deep metric learning. In: British machine vision conference, p 91

  24. Kaya M, Bilge HŞ (2019) Deep metric learning: a survey. Symmetry 11(9):1066

    Article  Google Scholar 

  25. Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Gr 35(4):1–12

    Article  Google Scholar 

  26. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  27. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Gr 31(4):1–10

    Google Scholar 

  28. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations

  29. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) PyTorch: an imperative style, high-performance deep learning library. Annu Conf Neural Inf Process Syst 32:8026–8037

    Google Scholar 

  30. Xu X, Yang M, Yang Y, Wang H (2021) Progressive domain-independent feature decomposition network for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 984–990

  31. Wang Z, Wang H, Yan J, Wu A, Deng C (2021) Domain-smoothing network for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 1143–1149

  32. Tian J, Xu X, Wang Z, Shen F, Liu X (2021) Relationship-preserving knowledge distillation for zero-shot sketch based image retrieval. In: ACM international conference on multimedia, pp 5473–5481

Download references

Funding

This work was supported by National Natural Science Foundation of China (No.62072112) and National Key R &D Program of China (2020AAA0108301).

Author information

Authors and Affiliations

Authors

Contributions

HR and ZZ contributed equally to this research. All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by HR. The first draft of the manuscript was written by ZZ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Hong Lu.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ren, H., Zheng, Z. & Lu, H. Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval. Neural Process Lett 54, 5711–5720 (2022). https://doi.org/10.1007/s11063-022-10881-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10881-y

Keywords

Navigation