Skip to main content
Log in

Cross-domain image retrieval: methods and applications

  • Trends and Surveys
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Cross-domain images have been witnessed in an increasing number of applications. This new trend triggers demands for cross-domain image retrieval (CDIR), which finds images in one visual domain according to a query image from another visual domain. Although image retrieval has been studied extensively, exploration of the CDIR remains at its initial stage. This study systematically surveys the methods and applications of the CDIR. Since images from different visual domains exhibit different features, learning discriminative feature representations while preserving domain-invariant features of images from different visual domains is the main challenge of the CDIR. According to the feature transformation stage of images from different visual domains, existing CDIR methods are categorized and analyzed. One is based on feature space migration and the other is based on image domain migration. Then, applications of CDIR in clothing, infrared, remote sensing, sketch, and other scenarios are summarized. Finally, the existing CDIR schemes are concluded, and new directions for future research are proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ghosh N, Agrawal S, Motwani M (2018) A survey of feature extraction for content-based image retrieval system. In: Proceedings of international conference on recent advancement on computer and communication. https://doi.org/10.1007/978-981-10-8198-9_32

  2. Ji X, Wang W, Zhang M et al (2017) Cross-domain image retrieval with attention modeling. In: Proceedings of the 25th ACM international conference on multimedia. pp 1654–1662

  3. Bae HB, Jeon T, Lee Y et al (2020) Non-visual to visual translation for cross-domain face recognition. IEEE Access 8:50452–50464

    Article  Google Scholar 

  4. Lu X, Zhong Y, Zheng Z et al (2021) Cross-domain road detection based on global-local adversarial learning framework from very high resolution satellite imagery[J]. ISPRS J Photogramm Remote Sens 180:296–312

    Article  Google Scholar 

  5. Hameed IM, Abdulhussain SH, Mahmmod BM (2021) Content-based image retrieval: a review of recent trends. Cogent Eng 8(1):1927469. https://doi.org/10.1080/23311916.2021.1927469

    Article  Google Scholar 

  6. Shao H, Wu Y, Cui W, et al (2008) Image retrieval based on MPEG-7 dominant color descriptor. In: 2008 The 9th international conference for young computer scientists. Pp 753–757. https://doi.org/10.1109/ICYCS.2008.89

  7. Duanmu X (2010) Image retrieval using color moment invariant. In: 2010 Seventh international conference on information technology: new generations. pp 200–203. https://doi.org/10.1109/ITNG.2010.231

  8. Wang XY, Zhang BB, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569. https://doi.org/10.1007/s11042-012-1055-7

    Article  Google Scholar 

  9. Tian PD (2013) A review on image feature extraction and representation techniques. Int J Multimed Ubiquitous Eng 8(4):385–396

    Google Scholar 

  10. Zhang D, Lu G (2004) Review of shape representation and description techniques. Pattern Recogn 37(1):1–19. https://doi.org/10.1016/j.patcog.2003.07.008

    Article  Google Scholar 

  11. Irtaza A, Jaffar MA (2015) Categorical image retrieval through genetically optimized support vector machines (GOSVM) and hybrid texture features. SIViP 9(7):1503–1519. https://doi.org/10.1007/s11760-013-0601-8

    Article  Google Scholar 

  12. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):1–27. https://doi.org/10.1145/1961189.1961199

    Article  Google Scholar 

  13. Fadaei S, Amirfattahi R, Ahmadzadeh MR (2017) Local derivative radial patterns: A new texture descriptor for content-based image retrieval. Signal Process 137:274–286. https://doi.org/10.1016/j.sigpro.2017.02.013

    Article  Google Scholar 

  14. Khan R, Barat C, Muselet D, et al (2012) Spatial orientations of visual word pairs to improve bag-of-visual-words model. In: Proceedings of the British machine vision conference. Pp 89.1–89.11. https://doi.org/10.5244/C.26.89

  15. Anwar H, Zambanini S, Kampel M (2014) A rotation-invariant bag of visual words model for symbols based ancient coin classification. In: 2014 IEEE international conference on image processing (ICIP), pp 5257–5261. https://doi.org/10.1109/ICIP.2014.7026064

  16. Shi X, Sapkota M, Xing F et al (2018) Pairwise based deep ranking hashing for histopathology image classification and retrieval. Pattern Recogn 81:14–22. https://doi.org/10.1016/j.patcog.2018.03.015

    Article  Google Scholar 

  17. Zhu L, Shen J, Xie L et al (2016) Unsupervised visual hashing with a semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486. https://doi.org/10.1109/TKDE.2016.2562624

    Article  Google Scholar 

  18. Alzu’bi A, Amira A, Ramzan N (2017) Content-based image retrieval with compact deep convolutional features. Neurocomputing 249:95–105. https://doi.org/10.1016/j.neucom.2017.03.072

    Article  Google Scholar 

  19. Kateb B, Yamamoto V, Yu C et al (2009) Infrared thermal imaging: a review of the literature and case report. Neuroimage 47:T154–T162. https://doi.org/10.1016/j.neuroimage.2009.03.043

    Article  Google Scholar 

  20. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph (TOG) 31(4):1–10. https://doi.org/10.1145/2185520.2185540

    Article  Google Scholar 

  21. Laubrock J, Dunst A (2020) Computational approaches to comics analysis[J]. Top Cogn Sci 12(1):274–310. https://doi.org/10.1111/tops.12476

    Article  Google Scholar 

  22. Howarth P, Rüger S (2004) Evaluation of texture features for content-based image retrieval. In: International conference on image and video retrieval. pp 326–334. Springer, Berlin, Heidelberg

  23. Syam B, Rao Y (2013) An effective similarity measure via genetic algorithm for content based image retrieval with extensive features. Int Arab J Inf Technol (IAJIT) 10(2):143–151

    Google Scholar 

  24. Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the IEEE international conference on computer vision. pp 1150–1157

  25. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  26. Ali N, Bajwa KB, Sablatnig R, Chatzichristofis SA, Iqbal Z, Rashid M, Habib HA (2016) A novel image retrieval based on visual words integration of SIFT and SURF. PLoS ONE 11(6):e0157428

    Article  Google Scholar 

  27. Kodituwakku SR, Selvarajah S (2004) Comparison of color features for image retrieval. Indian J Comput Sci Eng 1(3):207–211

    Google Scholar 

  28. Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications—clinical benefits and future directions. Int J Med Inform 73(1):1–23

    Article  Google Scholar 

  29. Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans Syst Man Cybern 6:610–621

    Article  Google Scholar 

  30. Liu J (2013) Image retrieval based on bag-of-words model. arXiv:1304.5168, https://arxiv.org/abs/1304.5168

  31. Amato G, Bolettieri P, Falchi F, et al (2013) Large scale image retrieval using vector of locally aggregated descriptors. In: International conference on similarity search and applications. pp 245–256. https://doi.org/10.1007/978-3-642-41062-8_25

  32. Perronnin F, Liu Y, Sánchez J, et al (2010) Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 3384–3391, https://doi.org/10.1109/CVPR.2010.5540009

  33. Qayyum A, Anwar SM, Awais M et al (2017) Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20. https://doi.org/10.1016/j.neucom.2017.05.025

    Article  Google Scholar 

  34. Wan J, Wang D, Hoi S C H, et al (2014) Deep learning for content-based image retrieval: A comprehensive study. In: Proceedings of the 22nd ACM international conference on Multimedia. Pp 157–166. https://doi.org/10.1145/2647868.2654948

  35. Liu Z, Luo P, Qiu S et al (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. IEEE Conf Comput Vis Pattern Recogn (CVPR) 2016:1096–1104. https://doi.org/10.1109/CVPR.2016.124

    Article  Google Scholar 

  36. Ji X, Wang W, Zhang M, et al (2107) Cross-domain image retrieval with attention modeling. In: Proceedings of the 25th ACM international conference on Multimedia. pp 1654–1662. https://doi.org/10.1145/3123266.3123429

  37. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  38. Wang W, Zhang M, Chen G et al (2016) Database meets deep learning: Challenges and opportunities. ACM SIGMOD Rec 45(2):17–22. https://doi.org/10.1145/3003665.3003669

    Article  Google Scholar 

  39. Huan-huan WANG, Sheng-nan CHU, Jing-wei GU (2021) Evaluation method of vehicle side modeling based on neural network. J Graph 42(4):688–695. https://doi.org/10.11996/JG.j.2095-302X.2021040688

    Article  Google Scholar 

  40. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751. https://doi.org/10.3115/v1/D14-1181

  41. Peng-fei Z, Zhi-liang S, Xiao-yao L, Xiang-bo O (2021) Classification algorithm of main bearing cap based on deep learning. J Graph 42(4):572–580. https://doi.org/10.11996/JG.j.2095-302X.2021040572

    Article  Google Scholar 

  42. Karpathy A, Toderici G, Shetty S et al (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Vis Pattern Recogn 2014:1725–1732. https://doi.org/10.1109/CVPR.2014.223

    Article  Google Scholar 

  43. Babenko A, Slesarev A, Chigorin A et al (2014) Neural codes for image retrieval. In: European conference on computer vision. pp 584–599. https://doi.org/10.1007/978-3-319-10590-1_38

  44. Zhou D, Li X, Zhang YJ (2016) A novel CNN-based match kernel for image retrieval. IEEE Int Conf Image Process (ICIP) 2016:2445–2449. https://doi.org/10.1109/ICIP.2016.7532798

    Article  Google Scholar 

  45. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, https://arxiv.org/abs/1409.1556

  46. Lei J, Zheng K, Zhang H et al (2017) Sketch based image retrieval via image-aided cross domain learning. IEEE Int Conf Image Process (ICIP) 2017:3685–3689. https://doi.org/10.1109/ICIP.2017.8296970

    Article  Google Scholar 

  47. Ha I, Kim H, Park S et al (2018) Image retrieval using BIM and features from pretrained VGG network for indoor localization. Build Environ 140:23–31. https://doi.org/10.1016/j.buildenv.2018.05.026

    Article  Google Scholar 

  48. Wang X, Duan X, Bai X (2016) Deep sketch feature for cross-domain image retrieval. Neurocomputing 207:387–397. https://doi.org/10.1016/j.neucom.2016.04.046

    Article  Google Scholar 

  49. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), pp 539–546, https://doi.org/10.1109/CVPR.2005.202

  50. Hoffer E, Ailon N (2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition, pp 84–92. https://doi.org/10.1007/978-3-319-24261-3_7

  51. Chen W, Chen X, Zhang J et al (2017) Beyond triplet loss: a deep quadruplet network for person re-identification. arXiv:1704.01719, https://arxiv.org/abs/1704.01719

  52. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, pp 807–814

  53. Kumar VA, Rajesh KS, Wilscy M (2019) Cross domain descriptor for sketch based image retrieval using siamese network. In: 2019 Fifth International Conference on Image Information Processing (ICIIP), pp 591–596. https://doi.org/10.1109/ICIIP47207.2019

  54. Sangkloy P, Burnell N, Ham C et al (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph (TOG) 35(4):1–12. https://doi.org/10.1145/2897824.2925954

    Article  Google Scholar 

  55. Qi Y, Song YZ, Zhang H et al (2016) Sketch-based image retrieval via siamese convolutional neural network. In: 2016 IEEE international conference on image processing (ICIP), pp 2460–2464. https://doi.org/10.1109/ICIP.2016.7532801

  56. Du H, Shi H, Liu Y et al (2021) Towards NIR-VIS Masked Face Recognition. IEEE Signal Process Lett 28:768–772. https://doi.org/10.1109/LSP.2021.3071663

    Article  Google Scholar 

  57. Wu A, Zheng WS, Yu HX et al (2017) RGB-infrared cross-modality person re-identification. In: 2017 IEEE international conference on computer vision (ICCV), pp 5390–5399, https://doi.org/10.1109/ICCV.2017.575

  58. Wang G, Yuan Y, Chen X et al (2018) Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM international conference on Multimedia, pp 274–282 https://doi.org/10.1145/3240508.3240552

  59. Xiang X, Lv N, Yu Z et al (2019) Cross-modality person re-identification based on dual-path multi-branch network. IEEE Sens J 19(23):11706–11713. https://doi.org/10.1109/JSEN.2019.2936916

    Article  Google Scholar 

  60. Yu Q, Liu F, Song YZ et al (2016) Sketch me that shoe. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 799–807. https://doi.org/10.1109/CVPR.2016.93

  61. Lin H, Fu Y, Lu P et al (2019) Tc-net for isbir: Triplet classification network for instance-level sketch based image retrieval. In: Proceedings of the 27th ACM international conference on multimedia, pp 1676–1684. https://doi.org/10.1145/3343031.3350900

  62. Huang G, Liu Z, Weinberger KQ, van Der Maaten L (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269, https://doi.org/10.1109/CVPR.2017.243

  63. Lee T, Lin YL, Chiang HY et al (2018) Cross-domain image-based 3D shape retrieval by view sequence learning. In: 2018 International conference on 3D vision (3DV), pp 258–266. https://doi.org/10.1109/3DV.2018.00038

  64. Song J, Song YZ, Xiang T et al (2017) Fine-grained image retrieval: the text/sketch input dilemma. In: The 28th British machine vision conference, p 12. https://doi.org/10.5244/C.31.45

  65. Fuentes A, Saavedra JM (2021) Sketch-QNet: a quadruplet convnet for color sketch-based image retrieval. In: Proceedings of the 2021 IEEE/CVF conference on computer vision and pattern recognition, pp 2134–2141. https://doi.org/10.1109/CVPRW53098.2021.00242

  66. Gong Y, Ke Q, Isard M et al (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vision 106(2):210–233. https://doi.org/10.1007/s11263-013-0658-4

    Article  Google Scholar 

  67. Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? In SIGGRAPH 31:1–10. https://doi.org/10.1145/2185520.2185540

    Article  Google Scholar 

  68. Miao Y, Li G, Bao C et al (2020) ClothingNet: cross-domain clothing retrieval with feature fusion and quadruplet loss. IEEE Access 8:142669–142679. https://doi.org/10.1109/ACCESS.2020.3013631

    Article  Google Scholar 

  69. Xing EP, Jordan MI, Russell SJ, Ng AY (2003) Distance metric learning with application to clustering with side-information. In: Proceedings of the international conference on neural information processing systems (NIPS), pp.521–528

  70. Sun Y, Zheng L, Yang Y, Tian Q, Wang S (2018) Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Computer Vision – ECCV 2018, pp 501–518. https://doi.org/10.1007/978-3-030-01225-0_30

  71. Yao H, Zhang S, Hong S, Zhang Y, Xu C, Tian Q (2019) Deep representation learning with part loss for person re-identification. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2019.2891888

    Article  MathSciNet  MATH  Google Scholar 

  72. Hadsell R, Chopra S, and LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR'06), pp 1735–1742. https://doi.org/10.1109/CVPR.2006.100

  73. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clusterin. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). pp 815–823. https://doi.org/10.1109/CVPR.2015.7298682

  74. Reale C, Nasrabadi NM, Kwon H et al (2016) Seeing the forest from the trees: A holistic approach to near-infrared heterogeneous face recognition. In: 2016 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 320–328, https://doi.org/10.1109/CVPRW.2016.47

  75. Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. ACM Trans Graph (TOG) 34(4):1–10. https://doi.org/10.1145/2766959

    Article  Google Scholar 

  76. Wang X, Sun Z, Zhang W et al (2016) Matching user photos to online products with robust deep features. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 7–14. https://doi.org/10.1145/2911996.2912002

  77. Bui T, Ribeiro L, Ponti M et al (2017) Compact descriptors for sketch-based image retrieval using a triplet loss convolutional neural network. Comput Vis Image Underst 164:27–37. https://doi.org/10.1016/j.cviu.2017.06.007

    Article  Google Scholar 

  78. Xiong W, Xiong Z, Cui Y et al (2020) A discriminative distillation network for cross-source remote sensing image retrieval. IEEE J Select Topics Appl Earth Observ Remote Sens 13:1234–1247. https://doi.org/10.1109/JSTARS.2020.2980870

    Article  Google Scholar 

  79. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Computer Vision – ECCV 2016, pp 499–515. https://doi.org/10.1007/978-3-319-46478-7_31

  80. Arandjelović R, Gronat P, Torii A, Pajdla T et al (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5297–5307. https://doi.org/10.1109/TPAMI.2017.2711011.

  81. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Process Syst 29:1857–1865

    Google Scholar 

  82. Song HO, Xiang Y, Jegelka S et al (2016) Deep metric learning via lifted structured feature embedding. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4004–4012. https://doi.org/10.1109/CVPR.2016.434

  83. Wang J, Zhou F, Wen S et al (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601. https://doi.org/10.1109/ICCV.2017.283

  84. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737, https://arxiv.org/abs/1703.07737

  85. Ibrahimi S, van Noord N, Geradts Z et al (2019) Deep metric learning for cross-domain fashion instance retrieval. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp 3165–3168. https://doi.org/10.1109/ICCVW.2019.00390

  86. Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS'06). IEEE, pp 459–468. https://doi.org/10.1109/FOCS.2006.49

  87. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: 22nd annual conference on neural information processing systems. pp 1753–1760

  88. Liu W, Wang J, Kumar S et al (2011) Hashing with graphs. In: Proceedings of the 28th international conference on machine learning

  89. Heo J P, Lee Y, He J et al (2012) Spherical hashing. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2957–2964

  90. Gong Y, Lazebnik S, Gordo A et al (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929. https://doi.org/10.1109/TPAMI.2012.193

    Article  Google Scholar 

  91. Kalantidis Y, Kennedy L, Li L J (2013) Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pp 105–112. https://doi.org/10.1145/2461466.2461485

  92. Xia R, Pan Y, Lai H et al (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence, pp 2156–2162

  93. Lin K, Yang HF, Hsiao JH et al (2015) Deep learning of binary hash codes for fast image retrieval. In: 2015 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 27–35. https://doi.org/10.1109/CVPRW.2015.7301269

  94. Wang D, Cui P, Ou M et al (2015) Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Trans Multimedia 17(9):1404–1416. https://doi.org/10.1109/TMM.2015.2455415

    Article  Google Scholar 

  95. Lin K, Lu J, Chen CS et al (2016) Learning compact binary descriptors with unsupervised deep neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 1183–1192. https://doi.org/10.1109/CVPR.2016.133

  96. Liu L, Shen F, Shen Y et al (2017) Deep sketch hashing: Fast free-hand sketch-based image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2862–2871

  97. Shen Y, Liu L, Shen F et al (2018) Zero-shot sketch-image hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3598–3607. https://doi.org/10.1109/CVPR.2018.00379

  98. Liu J, Zhang L (2019) Optimal projection guided transfer hashing for image retrieval. In: Proceedings of the AAAI conference on artificial intelligence, pp 8754–8761. https://doi.org/10.1109/TCSVT.2019.2943902

  99. Li Y, Zhang Y, Huang X et al (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536. https://doi.org/10.1109/TGRS.2018.2839705

    Article  Google Scholar 

  100. Xiong W, Xiong Z, Zhang Y et al (2020) A deep cross-modality hashing network for SAR and optical remote sensing images retrieval. IEEE J Select Topics Appl Earth Observ Remote Sens 13:5284–5296. https://doi.org/10.1109/JSTARS.2020.3021390

    Article  Google Scholar 

  101. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2414–2423. https://doi.org/10.1109/CVPR.2016.265

  102. Kingma DP, Welling M (2014) Auto-encoding variational bayes. arXiv:1312.6114, https://arxiv.org/abs/1312.6114v5

  103. Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  104. Pang K, Song Y Z, Xiang T et al (2017) Cross-domain Generative Learning for Fine-Grained Sketch-Based Image Retrieval. BMVC, pp 1–12. https://doi.org/10.5244/C.31.46

  105. Kampelmuhler M, Pinz A (2020) Synthesizing human-like sketches from natural images using a conditional convolutional decoder. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3203–3211

  106. Beal MJ (2003) Variational algorithms for approximate Bayesian inference. University of London, University College London (United Kingdom)

    Google Scholar 

  107. Yelamarthi S K, Reddy S K, Mishra A et al (2018) A zero-shot framework for sketch based image retrieval. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 300–317. https://doi.org/10.1007/978-3-030-01225-0_19

  108. Lei H, Chen S, Wang M et al (2021) A new algorithm for sketch-based fashion image retrieval based on cross-domain transformation. Wirel Commun Mobile Comput. https://doi.org/10.1155/2021/5577735

    Article  Google Scholar 

  109. Sain A, Bhunia AK, Yang Y et al (2021) Stylemeup: towards style-agnostic sketch-based image retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8504–8513

  110. Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. arXiv:1701.04862, https://arxiv.org/abs//1701.0486220

  111. Chen X, Duan Y, Houthooft R et al (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems, pp 2180–2188

  112. Denton E, Chintala S, Szlam A et al(2015) Deep generative image models using a laplacian pyramid of adversarial networks. arXiv:1506.05751, https://arxiv.org/abs/1506.05751

  113. Donahue J, Krähenbühl P, Darrell T (2017) Adversarial feature learning. arXiv:1605.09782, 2016. https://arxiv.org/abs/1605.09782

  114. Lin-long F, Yi L, Xiao-qin Z (2021) Generative adversarial network-based local facial stylization generation algorithm. J Graph 42(1):44–51. https://doi.org/10.11996/JG.j.2095-302X.2021010044

    Article  Google Scholar 

  115. Jian-jian JI, Gang YANG (2019) Hierarchical joint image completion method based on generative adversarial network. J Graph. https://doi.org/10.11996/JG.j.2095-302X.2019061008

    Article  Google Scholar 

  116. Qi-bin LUO, Qiang CAI (2019) Blind motion image deblurring using two-frame generative adversarial network. J Graph. https://doi.org/10.11996/JG.j.2095-302X.2019061056

    Article  Google Scholar 

  117. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the 2017 IEEE international conference on comput vision, pp 3754–3762. https://doi.org/10.1109/ICCV.2017.405

  118. Zhong Z, Zheng L, Zheng ZD, Li SZ, Yang Y (2018) Camera style adaptation for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5157–5166. https://doi.org/10.1109/CVPR.2018.00541

  119. Szegedy C, Vanhoucke V, Ioffe S et al (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  120. Liu J,Ni B,Yan Y et al (2018) Pose transferrable person re-identification. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018, pp.4099–4108. https://doi.org/10.1109/CVPR.2018.00431

  121. Liu C, Chang X, Shen YD (2020) Unity style transfer for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 6887–6896. https://doi.org/10.1109/CVPR42600.2020.00692

  122. Guo L, Liu J, Wang Y et al (2017) Sketch-based image retrieval using generative adversarial networks. In: Proceedings of the 25th ACM international conference on Multimedia. pp 1267–1268. https://doi.org/10.1145/3123266.3127939

  123. Isola P, Zhu JY, Zhou T et al (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1125–1134 https://doi.org/10.1109/CVPR.2017.632

  124. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Springer International Publishing, 2015, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

  125. Li C, Wand M (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. Eur Conf Comput Vis 17(8):702–716. https://doi.org/10.1007/978-3-319-46487-9_43

    Article  Google Scholar 

  126. Zhang J, Shen F, Liu L et al (2018) Generative domain-migration hashing for sketch-to-image retrieval. In: Proceedings of the European conference on computer vision (ECCV), pp 297–314. https://doi.org/10.1007/978-3-030-01216-8_19

  127. Bai C, Chen J, Ma Q et al (2020) Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval. J Vis Commun Image Represent 71:102835. https://doi.org/10.1016/j.jvcir.2020.102835

    Article  Google Scholar 

  128. Song L, Zhang M, Wu X et al (2017) Adversarial discriminative heterogeneous face recognition. arXiv:1709.03675, https://arxiv.org/pdf/1709.03675

  129. Xiong W, Lv Y, Zhang X et al (2020) Learning to translate for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(7):4860–4874. https://doi.org/10.1109/TGRS.2020.2968096

    Article  Google Scholar 

  130. Ferreira RS, Noce J, Oliveira DAB et al (2019) Generating sketch-based synthetic seismic images with generative adversarial networks. IEEE Geosci Remote Sens Lett 17(8):1460–1464. https://doi.org/10.1109/LGRS.2019.2945680

    Article  Google Scholar 

  131. Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3330–3337. https://doi.org/10.1109/CVPR.2012.6248071

  132. Gajic B, Baldrich R (2018) Cross-domain fashion image retrieval. In: 2019 IEEE international conference on cybernetics and computational intelligence (CyberneticsCom), pp 1869–1871. https://doi.org/10.1109/CVPRW.2018.00243

  133. Luo Y, Wang Z, Huang Z, Yang Y, and Lu H (2019) Snap and find: deep discrete cross-domain garment image retrieval. arXiv:1904.02887. http://arxiv.org/abs/1904.02887

  134. Kucer M, Murray N (2019) A detect-then-retrieve model for multi-domain fashion item retrieval. In: 2019 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 344–353. https://doi.org/10.1109/CVPRW.2019.00047

  135. Park S, Shin M, Ham S, Choe S, Kang Y (2019) Study on fashion image retrieval methods for efficient fashion visual search. In: Proceedings of 2019 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 316–319. https://doi.org/10.1109/CVPRW.2019.00042

  136. Zhu J-Y, Zheng W-S, Lai J-H, Li SZ (2014) Matching NIR face to VIS face using transduction. In: IEEE transactions on information forensics and security, pp 501–514. https://doi.org/10.1109/TIFS.2014.2299977

  137. Liu F, Gao C, Sun Y et al (2021) Infrared and visible cross-modal image retrieval through shared features. In: IEEE Transactions on circuits and systems for video technology, pp 4485–4496, https://doi.org/10.1109/TCSVT.2020.3048945

  138. Ling H, Wu J, Huang J et al (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79(9):5595–5616. https://doi.org/10.1007/s11042-019-08422-2

    Article  Google Scholar 

  139. Song L, Gong D, Li Z et al (2019) Occlusion robust face recognition based on mask learning with pairwise differential siamese network. In: Proceedings of the IEEE/CVF International conference on computer vision, pp 773–782. https://doi.org/10.1109/ICCV.2019.00086.

  140. Saxena S, Verbeek J (2016) Heterogeneous face recognition with CNNs. European conference on computer vision. pp 483–491. Springer, Cham, https://doi.org/10.1007/978-3-319-49409-8_40

  141. Liu X, Song L, Wu X et al (2016) Transferring deep representation for NIR-VIS heterogeneous face recognition. In: 2016 international conference on biometrics (ICB), pp 1–8. https://doi.org/10.1109/ICB.2016.7550064

  142. Wei X, Wang H, Scotney B et al (2020) Minimum margin loss for deep face recognition. Pattern Recogn 97:107012. https://doi.org/10.1016/j.patcog.2019.107012

    Article  Google Scholar 

  143. Wu B, Wu H (2020) Angular discriminative deep feature learning for face verification. In: ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2133–2137. https://doi.org/10.1109/ICASSP40776.2020.9053675

  144. He R, Wu X, Sun Z et al (2018) Wasserstein cnn: learning invariant features for nir-vis face recognition. IEEE Trans Pattern Anal Mach Intell 41(7):1761–1773. https://doi.org/10.1109/TPAMI.2018.2842770

    Article  Google Scholar 

  145. Wang R, Yang J, Yi D et al (2009) An analysis-by-synthesis method for heterogeneous face biometrics. In: International conference on biometrics. pp 319–326. Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-642-01793-3_33

  146. Chi M, Plaza A, Benediktsson JA, Sun Z, Shen J, Zhu Y (2016) Big data for remote sensing: challenges and opportunities. In: Proceedings of the IEEE, pp 2207–2219. https://doi.org/10.1109/JPROC.2016.2598228

  147. Zhang Y, Zhou W, Li H (2018) Retrieval across optical and sar images with deep neural network. In: Pacific rim conference on multimedia. pp 392–402. Springer, https://doi.org/10.1007/978-3-030-00776-8_36

  148. Chaudhuri U, Banerjee B, Bhattacharya A et al (2020) CMIR-NET: A deep learning based model for cross-modal retrieval in remote sensing. Pattern Recogn Lett 131:456–462. https://doi.org/10.1016/j.patrec.2020.02.006

    Article  Google Scholar 

  149. Bui T, Ribeiro L, Ponti M et al (2016) Generalisation and sharing in triplet convnets for sketch based visual search. arXiv:1611.05301, https://arxiv.org/abs/1611.05301

  150. Yu D, Liu Y, Pang Y et al (2018) A multi-layer deep fusion convolutional neural network for sketch based image retrieval. Neurocomputing 296:23–32. https://doi.org/10.1016/j.neucom.2018.03.031

    Article  Google Scholar 

  151. Guissous K, Gouet-Brunet V (2017) Image retrieval based on saliency for urban image contents. In: 2017 seventh international conference on image processing theory, tools and applications (IPTA), pp 1–6. https://doi.org/10.1109/IPTA.2017.8310131

  152. Russell B C, Sivic J, Ponce J et al (2011) Automatic alignment of paintings and photographs depicting a 3D scene. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp 545–552. https://doi.org/10.1109/ICCVW.2011.6130291

  153. Kong B, Supancic J, Ramanan D et al (2017) Cross-domain forensic shoeprint matching. In British Machine Vision Conference (BMVC), pp 1–5

  154. Chen W, Liu Y, Wang W et al (2021) Deep image retrieval: a survey. arXiv:2101.11282, https://arxiv.org/abs/2101.11282v1

Download references

Funding

The funding were provided by the Beijing Natural Science Foundation (Grant No. 4202017), the Key Research and Development Program of Anhui Province of China (Grant No. 202104a07020017) and the the Youth Talent Support Program of Beijing Municipal Education Commission (Grant No. CIT&TCD201904050).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Terminologies of cross-domain image retrieval

Figure 5 presents an example to illustrate the main concepts of CDIR. One day, you see someone else wearing a pair of very beautiful shoes on the street, and you want to buy a pair of the same shoes. However, you think it is a bit abrupt to ask directly and it is not polite to take pictures, so you silently write down the style of the shoes. When you get home, you sketch the shoe and retrieve it on e-commerce sites to find the same model. First, you upload the hand-drawn sketch (source domain) to the e-commerce site. E-commerce site analyzes the sketch and extracts the sketch features to store in the feature space. Subsequently, similar features are found in the database based on the extracted sketch features. Note that the images in the database of the e-commerce site are preprocessed pictures. Map (mapping function) the retrieved features and sketch features into the same space (common space). Finally, output the search result. The frequently used terminologies are listed in Table 4.

Fig. 5
figure 5

Sketch retrieval example

Table 4 Terminologies of cross-domain image retrieval

Appendix 2: Evaluation metrics of cross-domain image retrieval

The commonly used evaluation metrics for CDIR are shown in Table 5.

Table 6 shows the computation of TP, FN, FP, and TN. In Table 6, P represents the correct prediction of the model, and N represents the wrong prediction of the model. The precision is defined as TP divided by the sum of TP and FP, and the recall is defined as TP divided by the sum of TP and FN. The relevant results are shown in Table 6.

Table 5 Evaluation metrics
Table 6 Definition of retrieval results

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, X., Han, X., Li, H. et al. Cross-domain image retrieval: methods and applications. Int J Multimed Info Retr 11, 199–218 (2022). https://doi.org/10.1007/s13735-022-00244-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-022-00244-7

Keywords

Navigation