Abstract
Zero-shot sketch-based image retrieval (ZS-SBIR) is an extension of sketch-based image retrieval (SBIR) that aims to search relevant images with query sketches of the unseen categories. Most previous methods focus more on preserving semantic knowledge and improving domain alignment performance, but neglect to capture the correlation between inter-modal features, resulting in unsatisfactory performance. Hence, a sketch-image cross-modal retrieval framework is proposed to maximize the sketch-image correlation. For this framework, we develop a discriminant adversarial learning method that incorporates intra-modal discrimination, inter-modal consistency, and inter-modal correlation into a deep learning network for common feature representation learning. Specifically, sketch and image features are first projected into a shared feature subspace to achieve modality-invariance. Subsequently, we adopt a category label predictor to achieve intra-modal discrimination, use adversarial learning to confuse modal information for inter-modal consistency, and introduce correlation learning to maximize inter-modal correlation. Finally, the trained deep learning model is used to test unseen categories. Extensive experiments conducted on three zero-shot datasets show that this method outperforms state-of-the-art methods. For retrieval accuracy of unseen categories, this method exceeds the state-of-the-art methods by approximately 0.6% on the RSketch dataset, 5% on the Sketchy dataset, and 7% on the TU-Berlin dataset. We also conduct experiments on the dataset of image-based 3D model scene retrieval, the proposed method significantly outperforms the state-of-the-art approaches in all standard metrics.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Qi Y, Song YZ, Zhang H, Liu J (2016) Sketch-based image retrieval via Siamese convolutional neural network. In: Proceedings of the international conference on image processing (ICIP), pp 2460-2464. IEEE. https://doi.org/10.1109/ICIP.2016.7532801
Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2298-2307. IEEE. https://doi.org/10.1109/CVPR.2017.247
Zhang J, Shen F, Liu L, Zhu F, Yu M, Shao L, Tao H, Gool L V (2018) Generative Domain-Migration Hashing for Sketch-to-Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 304-321. Springer. https://doi.org/10.1007/978-3-030-01216-8_19
Yu Q, Song J, Song YZ, Xiang T, Hospedales TM (2021) Fine-grained instance-level sketch-based image retrieval. Int J Comput Vis 129(2):484–500. https://doi.org/10.1007/s11263-020-01382-3
Yang Z, Zhu X, Qian J, Liu P (2021) Dark-aware network for fine-grained sketch-based image retrieval. Signal Process Lett 28:264–268. https://doi.org/10.1109/LSP.2020.3043972
Bai C, Chen J, Ma Q, Hao P, Chen S (2020) Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval. J Vis Commun Image Represent 71:102835–102842. https://doi.org/10.1016/j.jvcir.2020.102835
Bhunia AK, Yang Y, Hospedales TM, Xiang T, Song YZ (2020) Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 9776-9785. IEEE, https://doi.org/10.1109/CVPR42600.2020.00980
Song J, Yu Q, Song YZ, Xiang T, Hospedales T M (2017) Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 5552-5561. IEEE. https://doi.org/10.1109/ICCV.2017.592
Lin H, Yu Y, Lu P, Gong S, Xue X, Jiang YG (2019) TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1676-1684. ACM. https://doi.org/10.1145/3343031.3350900
Bui T, Ribeiro LSP, Ponti M, Collomosse JP (2018) Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:77–87. https://doi.org/10.1016/j.cag.2017.12.006
Pang K, Yang Y, Hospedales T M, Xiang T, Song Y Z (2020) Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10344-10352. https://doi.org/10.1109/CVPR42600.2020.01036
Pang K, Li K, Yang Y, Zhang H, Hospedales T M, Xiang T, Song Y Z (2020) Generalising Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 677-686. IEEE. https://doi.org/10.1109/CVPR.2019.00077
Liu Q, Xie L, Wang H, Yuille AL (2019) Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 3661-3670. IEEE. https://doi.org/10.1109/ICCV.2019.00376
Pandey A, Mishra A, Verma VK, Mittal A, (2019) Adversarial Joint-Distribution Learning for Novel Class Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision workshops, pp 1391-1400. IEEE. https://doi.org/10.1109/ICCVW.2019.00175
Yelamarthi SK, Reddy MSK, Mishra Ashish, Mittal A (2018) A Zero-Shot Framework for Sketch Based Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 316-333. Springer. https://doi.org/10.1007/978-3-030-01225-0_19
Chaudhuri U, Banerjee B, Bhattacharya A, Datcu M (2020) CrossATNet - a novel cross-attention based framework for sketch-based image retrieval. Image Vis Comput 104:104003–1040012. https://doi.org/10.1016/j.imavis.2020.104003
Dey S, Riba P, Dutta A, Lladós JL, Song YZ (2020) Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2179-2188. IEEE. https://doi.org/10.1109/CVPR.2019.00228
Dutta A, Akata Z (2020) Semantically tied paired cycle consistency for any-shot sketch-based image retrieval. Int J Comput Vis 128(10):2684–2703. https://doi.org/10.1007/s11263-020-01350-x
Xu P, Yin Q, Huang Y, Song YZ, Ma Z, Wang L, Xiang T, Kleijn WB, Guo J (2018) Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing 278:75–86. https://doi.org/10.1016/j.neucom.2017.05.099
Zhang X, Li X, Li X, Shen M (2018) Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges. Neurocomputing 322:38–46. https://doi.org/10.1016/j.neucom.2018.09.047
Wang Y, Huang F, Zhang Y, Feng R, Zhang T, Fan W (2020) Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval. Pattern Recognit 100:107148–107160. https://doi.org/10.1016/j.patcog.2019.107148
Wang F, Lin S, Luo X, Wu H, Wang R, Zhou F (2017) A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput Graph Forum 36(7):157–166. https://doi.org/10.1111/cgf.13281
Lei J, Song Y, Peng B, Ma Z, Shao L, Song YZ (2020) Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Trans Circuits Syst Video Technol 30(9):3226–3237. https://doi.org/10.1109/TCSVT.2019.2936710
Shen Y, Liu L, Shen F, Shao L (2018) Zero-Shot Sketch-Image Hashing. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3598-3607. IEEE. https://doi.org/10.1109/CVPR.2018.00379
Dutta T, Biswas S (2019) Style-Guided Zero-Shot Sketch-based Image Retrieval. In: Proceedings of the 30th British Machine Vision Conference (BMVC), pp 209-210. BMVA Press
Dutta T, Singh A, Biswas S (2021) StyleGuide: Zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multim 23:2833–2842. https://doi.org/10.1109/TMM.2020.3017918
Wang W, Shi Y, Chen S, Peng Q, Zheng F, You X (2021) Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI). pp 1106-1112. https://doi.org/10.24963/ijcai.2021/153
Tian J, Xu Xing, Wang Z, Shen F, Liu X (2021) Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In: Proceedings of the ACM Multimedia Conference. pp 5473-5481. ACM. https://doi.org/10.1145/3474085.3475676
Zhang Z, Zhang Y, Feng R, Zhang T, Fan W (2020) Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp 12943-12950. AAAI Press
Deng C, Xu X, Wang H, Yang M, Tao D (2020) Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans Image Process 29:8892–8902. https://doi.org/10.1109/TIP.2020.3020383
Goodfellow IJ, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, Courville A, Bengio Y (2014) Generative Adversarial Nets. In: Proceedings of the annual conference on neural information processing systems. pp 2672-2680
Zhu L, Song J, Zhu X, Zhang C, Zhang S, Yuan X, Wang P (2020) Adversarial learning-based semantic correlation representation for cross-modal retrieval. IEEE Multim 27(4):79–90
Zheng W, Liu H, Wang B, Sun F (2019) Cross-modal surface material retrieval using discriminant adversarial learning. IEEE Trans Ind Informatics 15(9):4978–4987
Wang H, Sahoo D, Liu C, Lim E P, Hoi SCH (2019) Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 11572-11581. IEEE. https://doi.org/10.1109/CVPR.2019.01184
Chen J, Fang Y (2018) Deep Cross-Modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-Based 3D Shape Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 624-640. Springer. https://doi.org/10.1007/978-3-030-01261-8_37
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: A unified embedding for face recognition and clustering. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 815-823. IEEE. https://doi.org/10.1109/CVPR.2015.7298682
Wen Y, Zhang K, Li Z, Qian Y (2016) A Discriminative Feature Learning Approach for Deep Face Recognition. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 499-515. Springer. https://doi.org/10.1007/978-3-319-46478-7_31
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-Center Loss for Multi-View 3D Object Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1945-1954. IEEE. https://doi.org/10.1109/CVPR.2018.00208
Xu F, Yang W, Jiang T, Lin S, Luo H, Xia GS (2020) Mental retrieval of remote sensing images via adversarial sketch-image feature learning. IEEE Trans Geosci Remote Sens 58(11):7801–7814. https://doi.org/10.1109/TGRS.2020.2984316
Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph 35(4):1–12. https://doi.org/10.1145/2897824.2925954
Deng J, Dong W, Socher R, Li LJ, Li FF (2009) ImageNet: A large-scale hierarchical image database. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2448-255. IEEE. https://doi.org/10.1109/CVPR.2009.5206848
Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph 31(4):1–10. https://doi.org/10.1145/2185520.2185540
Zhang H, Liu S, Zhang C, Ren W, Wang R, Cao X (2016) SketchNet: Sketch Classification with Web Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1105-1113. IEEE. https://doi.org/10.1109/CVPR.2016.125
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 770-778. IEEE. https://doi.org/10.1109/CVPR.2016.90
Sohn K (2016) Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Proceedings of the annual conference on neural information processing systems. pp 1849-1857
Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked List Loss for Deep Metric Learning. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 5207-5216. IEEE. https://doi.org/10.1109/CVPR.2019.00535
Bottou L (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Proceedings of the 19th international conference on computational statistics. pp 177-186. Physica-Verlag. https://doi.org/10.1007/978-3-7908-2604-3_16
Kingma DP, Ba J (2015) Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd international conference on learning representations. arXiv:1412.6980
Yuan J, Rashid HA, Li B, Lu Y, Schreck T, Bui NM, Do TL, Nguyen KT, Nguyen TA, Nguyen VN, Tran MT, Wang T (2019) Extended 2D Scene Sketch-Based 3D Scene Retrieval. In: Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval. pp 33-39. Eurographics Association. https://doi.org/10.2312/3dor.20191059
Maaten LVD, Geoffrey H (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724
Hu R, Collomosse JP (2013) A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806. https://doi.org/10.1016/j.cviu.2013.02.005
Saavedra JM, Bustos B (2014) Sketch-based image retrieval using keyshapes. Multim Tools Appl 73(3):2033–2062. https://doi.org/10.1007/s11042-013-1689-0
Radenovic F, Tolias G, Chum O (2018) Deep Shape Matching. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 774-791. Springer. https://doi.org/10.1007/978-3-030-01228-1_46
Jiang T, Xia GS, Lu Q, Shen W (2017) Retrieving aerial scene images with learned deep image-sketch features. J Comput Sci Technol 32(4):726–737. https://doi.org/10.1007/s11390-017-1754-7
Zhen L, Hu P, Wang X, Peng D (2019) Deep Supervised Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10394-10403. IEEE. https://doi.org/10.1109/CVPR.2019.01064
Jing L, Vahdani E, Tan J, Tian Y (2021) Cross-Modal Center Loss for 3D Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3142-3151. IEEE. https://doi.org/10.1109/CVPR46437.2021.00316
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Grant 2018YFB2101504, in part by the Key Research and Development Program of Shanxi Province of China under Grant 201903D121147, in part by the Natural Science Foundation of Shanxi Province of China under Grant 201901D111150, in part by the Research Project Supported by Shanxi Scholarship Council of China under Grant 2020-113.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiao, S., Han, X., Xiong, F. et al. Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval. Neural Comput & Applic 34, 13469–13483 (2022). https://doi.org/10.1007/s00521-022-07169-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07169-6