Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

Kondylidis, Nikolaos; Tzelepi, Maria; Tefas, Anastasios

doi:10.1007/s11042-018-6212-1

Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

Published: 01 June 2018

Volume 77, pages 30729–30748, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

970 Accesses
15 Citations
Explore all metrics

Abstract

In this paper, a novel term frequency-inverse document frequency (tf-idf) based method that utilizes deep Convolutional Neural Networks (CNN) for Content Based Image Retrieval (CBIR) is proposed. That is, we treat the learned filters of the convolutional layers of a CNN model as detectors of visual words. Each of these filters has been trained to be activated in different visual patterns. Thus, since the activations of each filter provide information about the degree of presence of the visual pattern that the filter has learned during the training procedure, we consider the activations of these filters as the tf part. Subsequently, we propose three approaches of computing the idf part. Finally, we propose a query expansion technique on top of the formulated descriptors. The proposed approach interconnects the standard tf-idf method with the modern CNN analysis for visual content, providing a very powerful image retrieval technique with improved results as it is highlighted by extensive experiments in four challenging image datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Recommendation system based on deep learning methods: a systematic review and new directions

Article 03 August 2019

Learning to Prompt for Vision-Language Models

Article 31 July 2022

Notes

https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet

References

Arandjelovic R, Zisserman A (2013) All about vlad. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Lempitsky V (2015) Aggregating deep convolutional features for image retrieval. arXiv:1510.07493
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Computer Vision–ECCV 2014. Springer, pp 584–599
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Google Scholar
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: 2007 IEEE 11th international conference on computer vision. IEEE, pp 1–8
Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1. Prague, pp 1–2
Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 253–262
Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:e2
Article Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257
Hinami R, Matsui Y, Satoh S (2017) Region-based image retrieval revisited. arXiv:1709.09106
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2016) Efficient diffusion on region manifolds: recovering small objects with compact cnn representations. arXiv:1611.05113
Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3310–3317
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Zisserman A, Forsyth D, Torr P (eds) European conference on computer vision, volume I of LNCS. Springer, Berlin, pp 304–317
Jégou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Article Google Scholar
Kato T (1992) Database architecture for content-based image retrieval. In: SPIE/IS&T 1992 symposium on electronic imaging: science and technology. International Society for Optics and Photonics, pp 112–123
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Le Cun B B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems. Citeseer
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098
Article Google Scholar
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999
Article Google Scholar
Liu Z, Wang S, Tian Q (2016) Fine-residual vlad for image retrieval. Neurocomputing 173:1183–1191
Article Google Scholar
Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision, vol 2. IEEE, pp 1150–1157
Mayron LM (2008) Image retrieval using visual attention. Florida Atlantic University
Mohedano E, Salvador A, McGuinness K, Marques F, O’Connor N E, Nieto X G (2016) Bags of local convolutional features for scalable instance search. arXiv:1604.04653
Ng J, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2161–2168
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3384–3391
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258
Article Google Scholar
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3626–3633
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision. Proceedings. IEEE, pp 1470–1477
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tzelepi M, Tefas A (2016) Exploiting supervised learning for finetuning deep cnns in content based image retrieval. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 2918–2923
Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478
Article Google Scholar
Voorhees EM (1985) The cluster hypothesis revisited. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 188–196
Wan J, Wang D, Hoi SC H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM international conference on multimedia. ACM, pp 157–166
Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241
Article Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833
Zhao W-L, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC-24th British machine vision conference

Download references

Acknowledgments

Maria Tzelepi was supported by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (PhD Scholarship No. 2826).

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Nikolaos Kondylidis, Maria Tzelepi & Anastasios Tefas

Authors

Nikolaos Kondylidis
View author publications
You can also search for this author in PubMed Google Scholar
Maria Tzelepi
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Tefas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Tzelepi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kondylidis, N., Tzelepi, M. & Tefas, A. Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval. Multimed Tools Appl 77, 30729–30748 (2018). https://doi.org/10.1007/s11042-018-6212-1

Download citation

Received: 13 June 2017
Revised: 10 April 2018
Accepted: 24 May 2018
Published: 01 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11042-018-6212-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Recommendation system based on deep learning methods: a systematic review and new directions

Learning to Prompt for Vision-Language Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Recommendation system based on deep learning methods: a systematic review and new directions

Learning to Prompt for Vision-Language Models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation