Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval

Jiao, Shichao; Han, Xie; Xiong, Fengguang; Yang, Xiaowen; Han, Huiyan; He, Ligang; Kuang, Liqun

doi:10.1007/s00521-022-07169-6

Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval

Original Article
Published: 28 March 2022

Volume 34, pages 13469–13483, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Shichao Jiao¹,
Xie Han¹,
Fengguang Xiong¹,
Xiaowen Yang¹,
Huiyan Han¹,
Ligang He² &
…
Liqun Kuang ORCID: orcid.org/0000-0003-3276-5748¹

571 Accesses
6 Citations
Explore all metrics

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) is an extension of sketch-based image retrieval (SBIR) that aims to search relevant images with query sketches of the unseen categories. Most previous methods focus more on preserving semantic knowledge and improving domain alignment performance, but neglect to capture the correlation between inter-modal features, resulting in unsatisfactory performance. Hence, a sketch-image cross-modal retrieval framework is proposed to maximize the sketch-image correlation. For this framework, we develop a discriminant adversarial learning method that incorporates intra-modal discrimination, inter-modal consistency, and inter-modal correlation into a deep learning network for common feature representation learning. Specifically, sketch and image features are first projected into a shared feature subspace to achieve modality-invariance. Subsequently, we adopt a category label predictor to achieve intra-modal discrimination, use adversarial learning to confuse modal information for inter-modal consistency, and introduce correlation learning to maximize inter-modal correlation. Finally, the trained deep learning model is used to test unseen categories. Extensive experiments conducted on three zero-shot datasets show that this method outperforms state-of-the-art methods. For retrieval accuracy of unseen categories, this method exceeds the state-of-the-art methods by approximately 0.6% on the RSketch dataset, 5% on the Sketchy dataset, and 7% on the TU-Berlin dataset. We also conduct experiments on the dataset of image-based 3D model scene retrieval, the proposed method significantly outperforms the state-of-the-art approaches in all standard metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval

Domain-aware double attention network for zero-shot sketch-based image retrieval with similarity loss

Article 01 August 2023

Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval

Article Open access 29 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Qi Y, Song YZ, Zhang H, Liu J (2016) Sketch-based image retrieval via Siamese convolutional neural network. In: Proceedings of the international conference on image processing (ICIP), pp 2460-2464. IEEE. https://doi.org/10.1109/ICIP.2016.7532801
Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2298-2307. IEEE. https://doi.org/10.1109/CVPR.2017.247
Zhang J, Shen F, Liu L, Zhu F, Yu M, Shao L, Tao H, Gool L V (2018) Generative Domain-Migration Hashing for Sketch-to-Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 304-321. Springer. https://doi.org/10.1007/978-3-030-01216-8_19
Yu Q, Song J, Song YZ, Xiang T, Hospedales TM (2021) Fine-grained instance-level sketch-based image retrieval. Int J Comput Vis 129(2):484–500. https://doi.org/10.1007/s11263-020-01382-3
Article Google Scholar
Yang Z, Zhu X, Qian J, Liu P (2021) Dark-aware network for fine-grained sketch-based image retrieval. Signal Process Lett 28:264–268. https://doi.org/10.1109/LSP.2020.3043972
Article Google Scholar
Bai C, Chen J, Ma Q, Hao P, Chen S (2020) Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval. J Vis Commun Image Represent 71:102835–102842. https://doi.org/10.1016/j.jvcir.2020.102835
Article Google Scholar
Bhunia AK, Yang Y, Hospedales TM, Xiang T, Song YZ (2020) Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 9776-9785. IEEE, https://doi.org/10.1109/CVPR42600.2020.00980
Song J, Yu Q, Song YZ, Xiang T, Hospedales T M (2017) Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 5552-5561. IEEE. https://doi.org/10.1109/ICCV.2017.592
Lin H, Yu Y, Lu P, Gong S, Xue X, Jiang YG (2019) TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1676-1684. ACM. https://doi.org/10.1145/3343031.3350900
Bui T, Ribeiro LSP, Ponti M, Collomosse JP (2018) Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression. Comput Graph 71:77–87. https://doi.org/10.1016/j.cag.2017.12.006
Article Google Scholar
Pang K, Yang Y, Hospedales T M, Xiang T, Song Y Z (2020) Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10344-10352. https://doi.org/10.1109/CVPR42600.2020.01036
Pang K, Li K, Yang Y, Zhang H, Hospedales T M, Xiang T, Song Y Z (2020) Generalising Fine-Grained Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 677-686. IEEE. https://doi.org/10.1109/CVPR.2019.00077
Liu Q, Xie L, Wang H, Yuille AL (2019) Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision (ICCV), pp 3661-3670. IEEE. https://doi.org/10.1109/ICCV.2019.00376
Pandey A, Mishra A, Verma VK, Mittal A, (2019) Adversarial Joint-Distribution Learning for Novel Class Sketch-Based Image Retrieval. In: Proceedings of the international conference on computer vision workshops, pp 1391-1400. IEEE. https://doi.org/10.1109/ICCVW.2019.00175
Yelamarthi SK, Reddy MSK, Mishra Ashish, Mittal A (2018) A Zero-Shot Framework for Sketch Based Image Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 316-333. Springer. https://doi.org/10.1007/978-3-030-01225-0_19
Chaudhuri U, Banerjee B, Bhattacharya A, Datcu M (2020) CrossATNet - a novel cross-attention based framework for sketch-based image retrieval. Image Vis Comput 104:104003–1040012. https://doi.org/10.1016/j.imavis.2020.104003
Article Google Scholar
Dey S, Riba P, Dutta A, Lladós JL, Song YZ (2020) Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2179-2188. IEEE. https://doi.org/10.1109/CVPR.2019.00228
Dutta A, Akata Z (2020) Semantically tied paired cycle consistency for any-shot sketch-based image retrieval. Int J Comput Vis 128(10):2684–2703. https://doi.org/10.1007/s11263-020-01350-x
Article MATH Google Scholar
Xu P, Yin Q, Huang Y, Song YZ, Ma Z, Wang L, Xiang T, Kleijn WB, Guo J (2018) Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing 278:75–86. https://doi.org/10.1016/j.neucom.2017.05.099
Article Google Scholar
Zhang X, Li X, Li X, Shen M (2018) Better freehand sketch synthesis for sketch-based image retrieval: Beyond image edges. Neurocomputing 322:38–46. https://doi.org/10.1016/j.neucom.2018.09.047
Article Google Scholar
Wang Y, Huang F, Zhang Y, Feng R, Zhang T, Fan W (2020) Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval. Pattern Recognit 100:107148–107160. https://doi.org/10.1016/j.patcog.2019.107148
Article Google Scholar
Wang F, Lin S, Luo X, Wu H, Wang R, Zhou F (2017) A data-driven approach for sketch-based 3D shape retrieval via similar drawing-style recommendation. Comput Graph Forum 36(7):157–166. https://doi.org/10.1111/cgf.13281
Article Google Scholar
Lei J, Song Y, Peng B, Ma Z, Shao L, Song YZ (2020) Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Trans Circuits Syst Video Technol 30(9):3226–3237. https://doi.org/10.1109/TCSVT.2019.2936710
Article Google Scholar
Shen Y, Liu L, Shen F, Shao L (2018) Zero-Shot Sketch-Image Hashing. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3598-3607. IEEE. https://doi.org/10.1109/CVPR.2018.00379
Dutta T, Biswas S (2019) Style-Guided Zero-Shot Sketch-based Image Retrieval. In: Proceedings of the 30th British Machine Vision Conference (BMVC), pp 209-210. BMVA Press
Dutta T, Singh A, Biswas S (2021) StyleGuide: Zero-shot sketch-based image retrieval using style-guided image generation. IEEE Trans Multim 23:2833–2842. https://doi.org/10.1109/TMM.2020.3017918
Article Google Scholar
Wang W, Shi Y, Chen S, Peng Q, Zheng F, You X (2021) Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval. In: Proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI). pp 1106-1112. https://doi.org/10.24963/ijcai.2021/153
Tian J, Xu Xing, Wang Z, Shen F, Liu X (2021) Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In: Proceedings of the ACM Multimedia Conference. pp 5473-5481. ACM. https://doi.org/10.1145/3474085.3475676
Zhang Z, Zhang Y, Feng R, Zhang T, Fan W (2020) Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp 12943-12950. AAAI Press
Deng C, Xu X, Wang H, Yang M, Tao D (2020) Progressive cross-modal semantic network for zero-shot sketch-based image retrieval. IEEE Trans Image Process 29:8892–8902. https://doi.org/10.1109/TIP.2020.3020383
Article Google Scholar
Goodfellow IJ, Abadie JP, Mirza M, Xu B, Farley DW, Ozair S, Courville A, Bengio Y (2014) Generative Adversarial Nets. In: Proceedings of the annual conference on neural information processing systems. pp 2672-2680
Zhu L, Song J, Zhu X, Zhang C, Zhang S, Yuan X, Wang P (2020) Adversarial learning-based semantic correlation representation for cross-modal retrieval. IEEE Multim 27(4):79–90
Article Google Scholar
Zheng W, Liu H, Wang B, Sun F (2019) Cross-modal surface material retrieval using discriminant adversarial learning. IEEE Trans Ind Informatics 15(9):4978–4987
Article Google Scholar
Wang H, Sahoo D, Liu C, Lim E P, Hoi SCH (2019) Learning Cross-Modal Embeddings With Adversarial Networks for Cooking Recipes and Food Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 11572-11581. IEEE. https://doi.org/10.1109/CVPR.2019.01184
Chen J, Fang Y (2018) Deep Cross-Modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-Based 3D Shape Retrieval. In: Proceedings of the 15th european conference on computer vision(ECCV), pp 624-640. Springer. https://doi.org/10.1007/978-3-030-01261-8_37
Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: A unified embedding for face recognition and clustering. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 815-823. IEEE. https://doi.org/10.1109/CVPR.2015.7298682
Wen Y, Zhang K, Li Z, Qian Y (2016) A Discriminative Feature Learning Approach for Deep Face Recognition. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 499-515. Springer. https://doi.org/10.1007/978-3-319-46478-7_31
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-Center Loss for Multi-View 3D Object Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1945-1954. IEEE. https://doi.org/10.1109/CVPR.2018.00208
Xu F, Yang W, Jiang T, Lin S, Luo H, Xia GS (2020) Mental retrieval of remote sensing images via adversarial sketch-image feature learning. IEEE Trans Geosci Remote Sens 58(11):7801–7814. https://doi.org/10.1109/TGRS.2020.2984316
Article Google Scholar
Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph 35(4):1–12. https://doi.org/10.1145/2897824.2925954
Article Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li FF (2009) ImageNet: A large-scale hierarchical image database. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 2448-255. IEEE. https://doi.org/10.1109/CVPR.2009.5206848
Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph 31(4):1–10. https://doi.org/10.1145/2185520.2185540
Article Google Scholar
Zhang H, Liu S, Zhang C, Ren W, Wang R, Cao X (2016) SketchNet: Sketch Classification with Web Images. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 1105-1113. IEEE. https://doi.org/10.1109/CVPR.2016.125
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 770-778. IEEE. https://doi.org/10.1109/CVPR.2016.90
Sohn K (2016) Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Proceedings of the annual conference on neural information processing systems. pp 1849-1857
Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM (2019) Ranked List Loss for Deep Metric Learning. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 5207-5216. IEEE. https://doi.org/10.1109/CVPR.2019.00535
Bottou L (2010) Large-Scale Machine Learning with Stochastic Gradient Descent. In: Proceedings of the 19th international conference on computational statistics. pp 177-186. Physica-Verlag. https://doi.org/10.1007/978-3-7908-2604-3_16
Kingma DP, Ba J (2015) Adam: A Method for Stochastic Optimization. In: Proceedings of the 3rd international conference on learning representations. arXiv:1412.6980
Yuan J, Rashid HA, Li B, Lu Y, Schreck T, Bui NM, Do TL, Nguyen KT, Nguyen TA, Nguyen VN, Tran MT, Wang T (2019) Extended 2D Scene Sketch-Based 3D Scene Retrieval. In: Proceedings of the 12th Eurographics Workshop on 3D Object Retrieval. pp 33-39. Eurographics Association. https://doi.org/10.2312/3dor.20191059
Maaten LVD, Geoffrey H (2008) Visualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605
MATH Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175. https://doi.org/10.1023/A:1011139631724
Article MATH Google Scholar
Hu R, Collomosse JP (2013) A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Comput Vis Image Underst 117(7):790–806. https://doi.org/10.1016/j.cviu.2013.02.005
Article Google Scholar
Saavedra JM, Bustos B (2014) Sketch-based image retrieval using keyshapes. Multim Tools Appl 73(3):2033–2062. https://doi.org/10.1007/s11042-013-1689-0
Article Google Scholar
Radenovic F, Tolias G, Chum O (2018) Deep Shape Matching. In: Proceedings of the 14th european conference on computer vision(ECCV), pp 774-791. Springer. https://doi.org/10.1007/978-3-030-01228-1_46
Jiang T, Xia GS, Lu Q, Shen W (2017) Retrieving aerial scene images with learned deep image-sketch features. J Comput Sci Technol 32(4):726–737. https://doi.org/10.1007/s11390-017-1754-7
Article Google Scholar
Zhen L, Hu P, Wang X, Peng D (2019) Deep Supervised Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 10394-10403. IEEE. https://doi.org/10.1109/CVPR.2019.01064
Jing L, Vahdani E, Tan J, Tian Y (2021) Cross-Modal Center Loss for 3D Cross-Modal Retrieval. In: Proceedings of the conference on computer vision and pattern recognition(CVPR), pp 3142-3151. IEEE. https://doi.org/10.1109/CVPR46437.2021.00316

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant 2018YFB2101504, in part by the Key Research and Development Program of Shanxi Province of China under Grant 201903D121147, in part by the Natural Science Foundation of Shanxi Province of China under Grant 201901D111150, in part by the Research Project Supported by Shanxi Scholarship Council of China under Grant 2020-113.

Author information

Authors and Affiliations

School of Data Science and Technology, North University of China, Taiyuan, China
Shichao Jiao, Xie Han, Fengguang Xiong, Xiaowen Yang, Huiyan Han & Liqun Kuang
Department of Computer, The University of Warwick, Warwick, United Kingdom
Ligang He

Authors

Shichao Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Xie Han
View author publications
You can also search for this author inPubMed Google Scholar
Fengguang Xiong
View author publications
You can also search for this author inPubMed Google Scholar
Xiaowen Yang
View author publications
You can also search for this author inPubMed Google Scholar
Huiyan Han
View author publications
You can also search for this author inPubMed Google Scholar
Ligang He
View author publications
You can also search for this author inPubMed Google Scholar
Liqun Kuang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Liqun Kuang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiao, S., Han, X., Xiong, F. et al. Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval. Neural Comput & Applic 34, 13469–13483 (2022). https://doi.org/10.1007/s00521-022-07169-6

Download citation

Received: 14 September 2021
Accepted: 02 March 2022
Published: 28 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00521-022-07169-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval

Domain-aware double attention network for zero-shot sketch-based image retrieval with similarity loss

Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-Based Image Retrieval

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now