Abstract
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) addresses the problem of retrieving a photo given a query sketch whose category is unseen in the training stage. ZS-SBIR inherits the main challenges of multiple computer vision tasks, including SBIR, zero-shot learning and domain adaptation. The domain gap between sketch and photo requires the model to extract meaningful semantic information. To eliminate the domain gap, current methods mainly target introducing additional word embeddings or designing synthetic-based sub-networks. From another perspective, we focus on feature extraction and propose a simple and plug-and-play feature fusion module to enrich and dig into the semantic information, where an energy function is introduced to guide the feature fusion so that we obtain features with better retrieve performance. The proposed method achieves state-of-the-art results on two widely used ZS-SBIR datasets, even surpassing some methods that use additional word embeddings.
Similar content being viewed by others
References
Kapoor R, Sharma D, Gulati T (2021) State of the art content based image retrieval techniques using deep learning: a survey. Multimed Tools Appl 80(19):29561–29583
Yelamarthi SK, Reddy SK, Mishra A, Mittal A (2018) A zero-shot framework for sketch based image retrieval. In: European conference on computer vision, pp 300–317
Dey S, Riba P, Dutta A, Llados J, Song Y-Z (2019) Doodle to search: practical zero-shot sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 2179–2188
Liu Q, Xie L, Wang H, Yuille AL (2019) Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In: International conference on computer vision, pp 3662–3671
Zhang Z, Zhang Y, Feng R, Zhang T, Fan W (2020) Zero-shot sketch-based image retrieval via graph convolution network. In: AAAI conference on artificial intelligence, vol 34, pp 12943–12950
Zhu J, Xu X, Shen F, Lee RK-W, Wang Z, Shen HT (2020) OCEAN: a dual learning approach for generalized zero-shot sketch-based image retrieval. In: IEEE international conference on multimedia & Expo, pp 1–6
Chaudhuri U, Banerjee B, Bhattacharya A, Datcu M (2020) CrossATNet-a novel cross-attention based framework for sketch-based image retrieval. Image Vis Comput 104:104003
Deng C, Xu X, Wang H, Yang M, Tao D (2020) Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans Image Process 29:8892–8902
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: fast free-hand sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 2862–2871
Shen Y, Liu L, Shen F, Shao L (2018) Zero-shot sketch-image hashing. In: IEEE conference on computer vision and pattern recognition, pp 3598–3607
Dutta T, Biswas S (2019) Style-guided zero-shot sketch-based image retrieval. In: British machine vision conference, p 9
Dutta A, Akata Z (2019) Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In: IEEE conference on computer vision and pattern recognition, pp 5089–5098
Wang W, Shi Y, Chen S, Peng Q, Zheng F, You X (2021) Norm-guided adaptive visual embedding for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 1106–1112
Tursun O, Denman S, Sridharan S, Goan E, Fookes C (2022) An efficient framework for zero-shot sketch-based image retrieval. Pattern Recognit 21:108528
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp 770–778
Zhang Z, Zhang X, Peng C, Xue X, Sun J (2018) ExFuse: enhancing feature fusion for semantic segmentation. In: European conference on computer vision, pp 269–284
Yang L, Zhang R.-Y, Li L, Xie X (2021) SimAM: A simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning, pp 11863–11874
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: European conference on computer vision, pp 3–19
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 11534–11542
Zhai A, Wu H-Y (2019) Classification is a strong baseline for deep metric learning. In: British machine vision conference, p 91
Kaya M, Bilge HŞ (2019) Deep metric learning: a survey. Symmetry 11(9):1066
Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Gr 35(4):1–12
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Gr 31(4):1–10
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International conference on learning representations
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) PyTorch: an imperative style, high-performance deep learning library. Annu Conf Neural Inf Process Syst 32:8026–8037
Xu X, Yang M, Yang Y, Wang H (2021) Progressive domain-independent feature decomposition network for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 984–990
Wang Z, Wang H, Yan J, Wu A, Deng C (2021) Domain-smoothing network for zero-shot sketch-based image retrieval. In: International joint conference on artificial intelligence, pp 1143–1149
Tian J, Xu X, Wang Z, Shen F, Liu X (2021) Relationship-preserving knowledge distillation for zero-shot sketch based image retrieval. In: ACM international conference on multimedia, pp 5473–5481
Funding
This work was supported by National Natural Science Foundation of China (No.62072112) and National Key R &D Program of China (2020AAA0108301).
Author information
Authors and Affiliations
Contributions
HR and ZZ contributed equally to this research. All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by HR. The first draft of the manuscript was written by ZZ and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ren, H., Zheng, Z. & Lu, H. Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval. Neural Process Lett 54, 5711–5720 (2022). https://doi.org/10.1007/s11063-022-10881-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10881-y