Skip to main content
Log in

State of the art content based image retrieval techniques using deep learning: a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the recent years the rapid growth of multimedia content makes the image retrieval a challenging research task. Content Based Image Retrieval (CBIR) is a technique which uses features of image to search user required image from large image dataset according to the user’s request in the form of query image. Effective feature representation and similarity measures are very crucial to the retrieval performance of CBIR. The key challenge has been attributed to the well known semantic gap issue. The machine learning has been actively investigated as possible solution to bridge the semantic gap. The recent success of deep learning inspires as a hope for bridging the semantic gap in CBIR. In this paper, we investigate deep learning approach used for CBIR tasks under varied settings from our empirical studies; we find some encouraging conclusions and insights for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ahmad J, Mehmood I, Baik SW (2017) Efficient Object-Based Surveillance Image Search using Spatial Pooling of Convolutional Features. J Vis Commun Image R 45:62–76

    Article  Google Scholar 

  2. Ahmad J, Sajjad M, Rho S, Baik SW (2016) Multi-scale local structure patterns histogram for describing visual contents in social image retrieval systems. Multimed Tools Appl 75(20):12669–12692

    Article  Google Scholar 

  3. Alzu’bi A, Amira A, Ramzan N (2017) Content based image retrieval with compact deep convolutional features. Neurocomputing 249:95–105

    Article  Google Scholar 

  4. Arrospide J, Salgado L, Nieto M (2012) Video Analysis Based Vehicle Detection and Tracking using an MCMC sampling Framework, EURASIP J Adv Signal Process, pp. 1–20

  5. Baltieri D, Vezzani R, Cucchiara R (2011) 3dpes: 3d People Dataset for Surveillance and Forensics, In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, ACM

  6. Bhagyalaksluni A, Vijayachamundeeswari V (2014) A survey on content based image retrieval using various operators IEEE international conference on computer communication and systems (ICCCS '14)

  7. Cheng DS et al. (2011) Custom pictorial structures for re-identification, In BMVC

  8. Chollet F (2017) Xception: Deep Learning with Depth-wise Separable Convolutions, In IEEE Computer Vision and Pattern Recognition, Hawaii, USApp. 1251–1258

  9. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nuswide: A Real-world Web Image Database from National University of Singapore In Proc ACM Int Conf Image Video Retrieval, pp. 48:1–48:9

  10. Dharani T, Aroquiaraj IL (2013) A survey on content based image retrieval. In: Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp 485–490

  11. Felci Rajam I, Valli S (2013) A survey on content based image retrieval. Life Sci J 10(2):2475–2487

    Google Scholar 

  12. Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact Bilinear Pooling In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 317–326

  13. Gao L, Song J, Zou F, Zhang D, Shao J (2015) Scalable multimedia retrieval by deep learning hashing with relative similarity learning. In: Proc ACM Int Conf Multimedia, pp 903–906

  14. Glasner D, Galun M, Alpert S, Basri R, Shakhnarovich G (2012) Viewpoint aware object detection and continuous pose estimation. Image Vis Comput 30(12):923–933

    Article  Google Scholar 

  15. Goel R, Sharma A, Kapoor R (2019) Object recognition using deep learning. in J Comput Theor Nanosci 16(9):4044–4052

    Article  Google Scholar 

  16. Goodfellow I, Pouget-Abadie J, Mirza M, et al. (2014) Generative adversarial nets, Proc of International Conference on Neural Information Processing Systems, pp.2672–2680

  17. Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search, In Proceedings European conference on computer vision (ECCV)

  18. Gordo A, Almazan J, Revaud J, Larlus D (2017) End-to-end Learning of Deep Visual Representations for Image Retrieval. Int J Comput Vis 124(2):237–254

    Article  MathSciNet  Google Scholar 

  19. Gray D, Brennan S, Tao H (2007) Evaluating Appearance Models for Recognition, Reacquisition, and Tracking, In Proc IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), Citeseer

  20. Guoqing J, Yongdong Z, Ke L (2019) Deep hashing based on VAE-GAN for efficient similarity retrieval. Chi J Electron 28(6):1191–1197

  21. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Comput Vis Pattern Recognit, Las Vegas, NV, USA, pp 770–778

  22. Heesch D (2008) A survey of browsing models for content based image retrieval. J Multimed Tools Appl (Springer) 40:261–284

    Article  Google Scholar 

  23. Huiskes MJ, Lew MS (2008) The Mir Flickr retrieval evaluation. In: Proc ACM Int. Multimedia Inf Retrieval, pp 39–43

  24. Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MssIRFLICKR retrieval evaluation initiative. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp 527–536

  25. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search, In Computer Vision–ECCV, pp. 304–317

  26. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: Workshop on Web Scale Vision and Social Media (VSM), European Conference on Computer Vision (ECCV), pp 685–701

  27. Kingma D, Welling M (2013) Auto-encoding Variational Bayes, arXiv: 1312.6114

  28. Kokare M, Chatterji BN, Biswas PK (2002) A survey on current content based image retrieval methods. IETE J Res 48(3–4):261–271

    Article  Google Scholar 

  29. Li X, Xu M, Xu J, Weise T, Zou L, Sun F, Wu Z (2020) Image retrieval using a deep attention-based hash. IEEE Access 8:142229–142242

    Article  Google Scholar 

  30. Lin TY, Roy Chowdhury A, Maji S (2015) Bilinear CNN Models for Fine Grained Visual Recognition, In Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457

  31. Liong VE, Lu J, Wang G, Moulin P, Zhou J (2015) Deep Hashing for Compact Binary Codes Learning, In IEEE Comput Vis Pattern Recognit, pp. 2475–2483

  32. Liu P, Guo JM, Wu CY, Cai D (2017) Fusion of Deep Learning and Compressed Domain Features for Content Based Image Retrieval, IEEE transactions on image processing, vol. 26, no. 12

  33. Liu L, Yu M, Shao L (2015) Multi-view alignment hashing for efficient image search. IEEE Trans Image Process 24(3):956–966

    Article  MathSciNet  Google Scholar 

  34. Liu Y, Zhang D, Lu G, Ma W-Y (2007) A Survey of Content Based Image Retrieval with High-Level Semantics. J Pattern Recognit (Elsevier) 40:262–282

    Article  Google Scholar 

  35. Mohedano E, McGuinness K, O’Connor NE, Salvador A, Marques F, Giroi Nieto X (2016) Bags of Local Convolutional Features for Scalable Instance Search, In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ACM, pp. 327–331

  36. Mukherjee J, Mukhopadhyay J, Mitra P (2014) A survey on image retrieval performance of different bag of visual words indexing techniques. In: Proceedings of the IEEE Students' Technology Symposium, pp 99–104

  37. Muller H, Michoux N, Bandon D, Geissbuhler A (2004) A Review of Content Based Image Retrieval Systems in Medical Applications–—Clinical Benefits and Future Directions. Int J Med Inform 73:1–23

    Article  Google Scholar 

  38. Ng HW, Winkler S (2014) A Data-driven Approach to Cleaning Large Face Datasets, In IEEE International Conference on Image Processing pp. 343–347

  39. Ng J, Yang F, Davis L (2015) Exploiting Local Features from Deep Networks for Image Retrieval, In Proceedings of the IEEE Conf Comput Vis Pattern Recognit Workshops, pp. 53–61

  40. Nister D, Stewenius H (2006) Scalable Recognition with a Vocabulary Tree. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:2161–2168

    Google Scholar 

  41. Oussalah M (2008) Content based image retrieval: review of state of art and future directions. In: First Workshops on Image Processing Theory, Tools and Applications (IEEE), pp 1–10

  42. Ozuysal M, Lepetit V, Fua P (2009) Pose Estimation for Category Specific Multi-view Object Localization, In IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009, IEEE

  43. Patel T, Gandhi S (2017) A survey on content based similarity techniques for image retrieval. In: International Conference on Innovative Mechanisms for Industry Applications (IEEE), pp 219–223

  44. Patel F, Kasat D (2017) Hashing based indexing techniques for content based image retrieval: a survey. In: International Conference on Innovative Mechanisms for Industry Applications (IEEE), pp 279–283

  45. Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local Convolutional Features with Unsupervised Training for Image Retrieval, In Proceedings of the IEEE International Conference on Computer Vision, pp. 91–99

  46. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Computer Vision and Pattern Recognition (CVPR), pp. 1–8

  47. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1–8

  48. Qiu Z, Pan Y, Yao T, et al. (2017) Deep Semantic Hashing with Generative Adversarial Networks, Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, pp. 225–234

  49. Radenovic F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples, In Proceedings European conference on computer vision (ECCV)

  50. Rafiee G, Dlay SS, Woo WL (2010) A review of content-based image retrieval. In: 7th International Symposium on Communication Systems, Networks & Digital Signal Processing (CSNDSP 2010), pp 775–779

  51. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A New Approach to Cross-modal Multimedia Retrieval, In Proc ACMInt Conf Multimedia, pp. 251–260

  52. Roy S, Sangineto E, Demir B, Sebe N (2020) Metric learning based deep hashing network for content-based retrieval of remote sensing images. IEEE Geosci Remote Sens Lett 18(2):226–230

  53. Schwartz WR, Davis LS (2009) Learning discriminative appearance based models using partial least squares. In: 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing. IEEE, pp 322–329

  54. Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing, In IEEE Computer Vision and Pattern Recognition, pp. 37–45

  55. Shen F, Shen C, Shi Q, van den Hengel A, Tang Z, Shen HT (2015) Hashing on nonlinear manifolds. IEEE Trans Image Process 24(6):1839–1851

    Article  MathSciNet  Google Scholar 

  56. Shen X, Shen F, Sun QS, Yuan YH (2015) Multi-view latent hashing for efficient multimedia search. In: Proc ACM Int Conf Multimedia, pp 831–834

  57. Szegedy C, Liu W, Jia Y, Sermanet P (2015) Going deeper with convolutions. In: IEEE Computer Vision and Pattern Recognition (CVPR), pp 1–9

  58. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision, In Proc IEEE Conference Computer Vision Pattern Recognition (CVPR), pp. 2818–2826

  59. Tang J, Li Z, Zhu X (2018) Supervised deep hashing for scalable face image retrieval. Pattern Recogn 75:25–32

    Article  Google Scholar 

  60. Tolias G, Sicre R, Jegou H (2016) Particular object retrieval with integral max pooling of CNN activations. In: International Conference on Learning Representations (ICLR), pp 1–12

  61. Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: a large dataset for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intel 30(11):1958–1970

    Article  Google Scholar 

  62. Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478

    Article  Google Scholar 

  63. Veltkamp RC, Tanase M (2002) Content-based image retrieval systems: A survey. Multimed Sys Appl Ser (Springer) 21:47–101

  64. Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp 157–166

  65. Wang B, Yang Y, Xing X, Alan H, Heng TS (2017) Adversarial cross-modal retrieval. In: Proc of ACM on Multimedia Conference, pp 154–162

  66. Wolf L, Hassner T, Maoz I (2011) Face Recognition in Unconstrained Videos with Matched Background Similarity, In IEEE Computer Vision and Pattern Recognition, pp. 529–534

  67. Wu S, Oerleman A, Bakker EM, Lew MS (2017) Deep binary codes for large scale image retrieval. Neurocomputing 257:5–15

    Article  Google Scholar 

  68. Yang H, Lin K, Chen C (2017) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451

  69. Yang L et al. (2015) A large-scale Car dataset for fine grained categorization and verification, In Proceedings of the IEEE conference on computer vision and pattern recognition

  70. Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multilayer CNN features for image retrieval. Neurocomputing 237:235–241

    Article  Google Scholar 

  71. Yua W, Yangb K, Yaoa H, Suna X, Xub P (2017) Exploiting the complementary strengths of multilayer CNN features for image retrieval. Neurocomputing 237:235–241

    Article  Google Scholar 

  72. Zhao M, Zhang H, Sun J (2016) A novel image retrieval method based on multi trend structure descriptor. J Vis Commun Image Represent 38:73–81

    Article  Google Scholar 

  73. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Kapoor.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kapoor, R., Sharma, D. & Gulati, T. State of the art content based image retrieval techniques using deep learning: a survey. Multimed Tools Appl 80, 29561–29583 (2021). https://doi.org/10.1007/s11042-021-11045-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11045-1

Keywords

Navigation