Boosting VLAD with weighted fusion of local descriptors for image retrieval

Liu, Hao; Zhao, Qingjie; Zhang, Cong; Mbelwa, Jimmy T.; Tang, Song; Zhang, Jianwei

doi:10.1007/s11042-018-6712-z

Boosting VLAD with weighted fusion of local descriptors for image retrieval

Published: 06 October 2018

Volume 78, pages 11835–11855, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hao Liu¹,
Qingjie Zhao^1,2,
Cong Zhang¹,
Jimmy T. Mbelwa¹,
Song Tang² &
…
Jianwei Zhang²

262 Accesses
1 Citation
Explore all metrics

Abstract

In the last decade, many efforts have been developed for discriminative image representations. Among these works, vector of locally aggregated descriptors (VLAD) has been demonstrated to be an effective one. However, most VLAD-based methods generally employ detected SIFT descriptors and contain limited content information, in which the representation ability is deteriorated. In this work, we propose a novel framework to boost VLAD with weighted fusion of local descriptors (WF-VLAD), which encodes more discriminative clues and maintains higher performance. Toward a preferable image representation that contains sufficient details, our approach fuses SIFT sampled densely (dense SIFT) and detected from the interest points (detected SIFT) in the aggregation. Furthermore, we assign each detected SIFT corresponding weight that measured by saliency analysis to make the salient descriptors with relatively high importance. The proposed method can include sufficient image content information and highlight the important image regions. Finally, experiments on publicly available datasets demonstrate that our approach shows competitive performance in retrieval tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Image Fusion Techniques: A Survey

Article 24 January 2021

Content-based image retrieval through fusion of deep features extracted from segmented neutrosophic using depth map

Article 09 April 2024

References

Achanta R, Hemami SS, Estrada FV, Susstrunk S (2009) Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1597–1604
Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2911–2918
Arandjelovic R, Zisserman A (2013) All about VLAD. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision. Springer, pp 584–599
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Article Google Scholar
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of British machine vision conference, pp 76.1– 76.12
Delhumeau J, Gosselin P, Jegou H, Perez P (2013) Revisiting the VLAD image representation. In: Proceedings of ACM international conference on multimedia, pp 653–656
Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: Proceedings of IEEE International conference on image processing. IEEE, pp 3087–3091
Jegou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Proceedings of European conference on computer vision, pp 774–787
Chapter Google Scholar
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of European conference on computer vision, pp 304–317
Google Scholar
Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and patternn recognition, pp 3304–3311
Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Article Google Scholar
Kim TE, Kim MH (2015) Improving the search accuracy of the VLAD through weighted aggregation of local descriptors. J Vis Commun Image Represent 31:237–252
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2169–2178
Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 4758–4767
Li J, Xu C, Yang W, Sun C (2017) Spa: spatially pooled attributes for image retrieval. Neurocomputing
Liu Z, Li H, Zhou W, Rui T, Tian Q (2015) Making residual vector distribution uniform for distinctive image representation. IEEE Trans Circ Syst Vid Technol 26(2):375–384
Article Google Scholar
Liu Z, Wang S, Tian Q (2016) Fine-residual VLAD for image retrieval. Neurocomputing 173:1183–1191
Article Google Scholar
Liu H, Zhao Q, Wang H, Lv P, Chen Y (2017) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tools Appl 76(22):24,435–24,456
Article Google Scholar
Liu Z, Wang S, Zheng L, Tian Q (2017) Robust imagegraph: rank-level feature fusion for image search. IEEE Trans Image Process 26(7):3128–3141
Article MathSciNet Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mansourian L, Abdullah MT, Abdullah LN, Azman A, Mustaffa MR (2017) An effective fusion model for image retrieval. Multimed Tools Appl, 1–24
Marszałek M, Schmid C (2012) Accurate object recognition with shape masks. Int J Comput Vis 97(2):191–209
Article MathSciNet Google Scholar
Miller FP, Vandome AF, Mcbrewster J (2010) Region: Hessian affine region detector. Alphascript Publishing
Murata M, Nagano H, Mukai R, Kashino K, Satoh S (2014) Bm25 with exponential idf for instance search. IEEE Trans Multimed 16(6):1690–1699. https://doi.org/10.1109/TMM.2014.2323945
Article Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2161–2168
Perronnin F, Sánchez J., Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European conference on computer vision, pp 143–156
Chapter Google Scholar
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Radenović F., Tolias G, Chum O (2016) Cnn image retrieval learns from bow: unsupervised fine-tuning with hard examples. In: Proceedings of European conference on computer vision, pp 3–20
Chapter Google Scholar
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258
Article Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, pp 1470–1477
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Article Google Scholar
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Article MathSciNet Google Scholar
Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713–1728
Article Google Scholar
Tao R, Smeulders AWM, Chang SF (2015) Attributes and categories for generic instance search from one example. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 177–186
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3360–3367
Wu Y, Liu H, Yuan J, Zhang Q (2017) Is visual saliency useful for content-based image retrieval? Multimed Tools Appl, 1–24
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008
Article MathSciNet Google Scholar
Zhao WL, Ngo CW, Wang H (2016) Fast covariant VLAD for image search. IEEE Trans Multimed 18(9):1843–1854. https://doi.org/10.1109/TMM.2016.2585023
Article Google Scholar
Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition
Zheng L, Wang S, Liu Z, Tian Q (2015) Fast image retrieval: query pruning and early termination. IEEE Trans Multimed 17(5):648–659
Article Google Scholar
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of European conference on computer vision, pp 141–154
Chapter Google Scholar
Zhou Q, Wang C, Liu P, Li Q, Wang Y, Chen S (2016) Distribution entropy boosted VLAD for image retrieval. Entropy 18(8):311
Article Google Scholar

Download references

Acknowledgements

This work was partly supported by the China Scholarship Council (201706035021), the National Natural Science Foundation of China (61175096), the German Research Foundation in project Crossmodal Learning (TRR-169) and Chinese Government Scholarship under China Scholarship Council.

Author information

Authors and Affiliations

Beijing Lab of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
Hao Liu, Qingjie Zhao, Cong Zhang & Jimmy T. Mbelwa
Department of Informatics, University of Hamburg, Hamburg, Germany
Qingjie Zhao, Song Tang & Jianwei Zhang

Authors

Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qingjie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Cong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy T. Mbelwa
View author publications
You can also search for this author in PubMed Google Scholar
Song Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Zhao, Q., Zhang, C. et al. Boosting VLAD with weighted fusion of local descriptors for image retrieval. Multimed Tools Appl 78, 11835–11855 (2019). https://doi.org/10.1007/s11042-018-6712-z

Download citation

Received: 12 January 2018
Revised: 21 August 2018
Accepted: 19 September 2018
Published: 06 October 2018
Issue Date: May 2019
DOI: https://doi.org/10.1007/s11042-018-6712-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting VLAD with weighted fusion of local descriptors for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Image Fusion Techniques: A Survey

Content-based image retrieval through fusion of deep features extracted from segmented neutrosophic using depth map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Boosting VLAD with weighted fusion of local descriptors for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image Matching from Handcrafted to Deep Features: A Survey

Image Fusion Techniques: A Survey

Content-based image retrieval through fusion of deep features extracted from segmented neutrosophic using depth map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation