Skip to main content
Log in

Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, a novel term frequency-inverse document frequency (tf-idf) based method that utilizes deep Convolutional Neural Networks (CNN) for Content Based Image Retrieval (CBIR) is proposed. That is, we treat the learned filters of the convolutional layers of a CNN model as detectors of visual words. Each of these filters has been trained to be activated in different visual patterns. Thus, since the activations of each filter provide information about the degree of presence of the visual pattern that the filter has learned during the training procedure, we consider the activations of these filters as the tf part. Subsequently, we propose three approaches of computing the idf part. Finally, we propose a query expansion technique on top of the formulated descriptors. The proposed approach interconnects the standard tf-idf method with the modern CNN analysis for visual content, providing a very powerful image retrieval technique with improved results as it is highlighted by extensive experiments in four challenging image datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet

References

  1. Arandjelovic R, Zisserman A (2013) All about vlad. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1578–1585

  2. Babenko A, Lempitsky V (2015) Aggregating deep convolutional features for image retrieval. arXiv:1510.07493

  3. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Computer Vision–ECCV 2014. Springer, pp 584–599

  4. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  5. Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: 2007 IEEE 11th international conference on computer vision. IEEE, pp 1–8

  6. Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649

  7. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1. Prague, pp 1–2

  8. Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 253–262

  9. Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:e2

    Article  Google Scholar 

  10. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531

  11. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  12. Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257

  13. Hinami R, Matsui Y, Satoh S (2017) Region-based image retrieval revisited. arXiv:1709.09106

  14. Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2016) Efficient diffusion on region manifolds: recovering small objects with compact cnn representations. arXiv:1611.05113

  15. Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3310–3317

  16. Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Zisserman A, Forsyth D, Torr P (eds) European conference on computer vision, volume I of LNCS. Springer, Berlin, pp 304–317

  17. Jégou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716

    Article  Google Scholar 

  18. Kato T (1992) Database architecture for content-based image retrieval. In: SPIE/IS&T 1992 symposium on electronic imaging: science and technology. International Society for Optics and Photonics, pp 112–123

  19. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  20. Le Cun B B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems. Citeseer

  21. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  22. Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098

    Article  Google Scholar 

  23. Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999

    Article  Google Scholar 

  24. Liu Z, Wang S, Tian Q (2016) Fine-residual vlad for image retrieval. Neurocomputing 173:1183–1191

    Article  Google Scholar 

  25. Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision, vol 2. IEEE, pp 1150–1157

  26. Mayron LM (2008) Image retrieval using visual attention. Florida Atlantic University

  27. Mohedano E, Salvador A, McGuinness K, Marques F, O’Connor N E, Nieto X G (2016) Bags of local convolutional features for scalable instance search. arXiv:1604.04653

  28. Ng J, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61

  29. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2161–2168

  30. Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3384–3391

  31. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8

  32. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8

  33. Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258

    Article  Google Scholar 

  34. Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3626–3633

  35. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision. Proceedings. IEEE, pp 1470–1477

  36. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  38. Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708

  39. Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879

  40. Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

  41. Tzelepi M, Tefas A (2016) Exploiting supervised learning for finetuning deep cnns in content based image retrieval. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 2918–2923

  42. Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478

    Article  Google Scholar 

  43. Voorhees EM (1985) The cluster hypothesis revisited. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 188–196

  44. Wan J, Wang D, Hoi SC H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM international conference on multimedia. ACM, pp 157–166

  45. Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241

    Article  Google Scholar 

  46. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833

  47. Zhao W-L, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC-24th British machine vision conference

Download references

Acknowledgments

Maria Tzelepi was supported by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (PhD Scholarship No. 2826).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Tzelepi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kondylidis, N., Tzelepi, M. & Tefas, A. Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval. Multimed Tools Appl 77, 30729–30748 (2018). https://doi.org/10.1007/s11042-018-6212-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6212-1

Keywords

Navigation