Abstract
In this paper, a novel term frequency-inverse document frequency (tf-idf) based method that utilizes deep Convolutional Neural Networks (CNN) for Content Based Image Retrieval (CBIR) is proposed. That is, we treat the learned filters of the convolutional layers of a CNN model as detectors of visual words. Each of these filters has been trained to be activated in different visual patterns. Thus, since the activations of each filter provide information about the degree of presence of the visual pattern that the filter has learned during the training procedure, we consider the activations of these filters as the tf part. Subsequently, we propose three approaches of computing the idf part. Finally, we propose a query expansion technique on top of the formulated descriptors. The proposed approach interconnects the standard tf-idf method with the modern CNN analysis for visual content, providing a very powerful image retrieval technique with improved results as it is highlighted by extensive experiments in four challenging image datasets.
Similar content being viewed by others
References
Arandjelovic R, Zisserman A (2013) All about vlad. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Lempitsky V (2015) Aggregating deep convolutional features for image retrieval. arXiv:1510.07493
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Computer Vision–ECCV 2014. Springer, pp 584–599
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: 2007 IEEE 11th international conference on computer vision. IEEE, pp 1–8
Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1. Prague, pp 1–2
Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 253–262
Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:e2
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257
Hinami R, Matsui Y, Satoh S (2017) Region-based image retrieval revisited. arXiv:1709.09106
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2016) Efficient diffusion on region manifolds: recovering small objects with compact cnn representations. arXiv:1611.05113
Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3310–3317
Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Zisserman A, Forsyth D, Torr P (eds) European conference on computer vision, volume I of LNCS. Springer, Berlin, pp 304–317
Jégou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Kato T (1992) Database architecture for content-based image retrieval. In: SPIE/IS&T 1992 symposium on electronic imaging: science and technology. International Society for Optics and Photonics, pp 112–123
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Le Cun B B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems. Citeseer
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098
Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999
Liu Z, Wang S, Tian Q (2016) Fine-residual vlad for image retrieval. Neurocomputing 173:1183–1191
Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision, vol 2. IEEE, pp 1150–1157
Mayron LM (2008) Image retrieval using visual attention. Florida Atlantic University
Mohedano E, Salvador A, McGuinness K, Marques F, O’Connor N E, Nieto X G (2016) Bags of local convolutional features for scalable instance search. arXiv:1604.04653
Ng J, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2161–2168
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3384–3391
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3626–3633
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision. Proceedings. IEEE, pp 1470–1477
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Tzelepi M, Tefas A (2016) Exploiting supervised learning for finetuning deep cnns in content based image retrieval. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 2918–2923
Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478
Voorhees EM (1985) The cluster hypothesis revisited. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 188–196
Wan J, Wang D, Hoi SC H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM international conference on multimedia. ACM, pp 157–166
Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833
Zhao W-L, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC-24th British machine vision conference
Acknowledgments
Maria Tzelepi was supported by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (PhD Scholarship No. 2826).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kondylidis, N., Tzelepi, M. & Tefas, A. Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval. Multimed Tools Appl 77, 30729–30748 (2018). https://doi.org/10.1007/s11042-018-6212-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6212-1