Skip to main content
Log in

Boosting VLAD with weighted fusion of local descriptors for image retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the last decade, many efforts have been developed for discriminative image representations. Among these works, vector of locally aggregated descriptors (VLAD) has been demonstrated to be an effective one. However, most VLAD-based methods generally employ detected SIFT descriptors and contain limited content information, in which the representation ability is deteriorated. In this work, we propose a novel framework to boost VLAD with weighted fusion of local descriptors (WF-VLAD), which encodes more discriminative clues and maintains higher performance. Toward a preferable image representation that contains sufficient details, our approach fuses SIFT sampled densely (dense SIFT) and detected from the interest points (detected SIFT) in the aggregation. Furthermore, we assign each detected SIFT corresponding weight that measured by saliency analysis to make the salient descriptors with relatively high importance. The proposed method can include sufficient image content information and highlight the important image regions. Finally, experiments on publicly available datasets demonstrate that our approach shows competitive performance in retrieval tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Achanta R, Hemami SS, Estrada FV, Susstrunk S (2009) Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1597–1604

  2. Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2911–2918

  3. Arandjelovic R, Zisserman A (2013) All about VLAD. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1578–1585

  4. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision. Springer, pp 584–599

  5. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207

    Article  Google Scholar 

  6. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of British machine vision conference, pp 76.1– 76.12

  7. Delhumeau J, Gosselin P, Jegou H, Perez P (2013) Revisiting the VLAD image representation. In: Proceedings of ACM international conference on multimedia, pp 653–656

  8. Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: Proceedings of IEEE International conference on image processing. IEEE, pp 3087–3091

  9. Jegou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Proceedings of European conference on computer vision, pp 774–787

    Chapter  Google Scholar 

  10. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of European conference on computer vision, pp 304–317

    Google Scholar 

  11. Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and patternn recognition, pp 3304–3311

  12. Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716

    Article  Google Scholar 

  13. Kim TE, Kim MH (2015) Improving the search accuracy of the VLAD through weighted aggregation of local descriptors. J Vis Commun Image Represent 31:237–252

    Article  Google Scholar 

  14. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2169–2178

  15. Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 4758–4767

  16. Li J, Xu C, Yang W, Sun C (2017) Spa: spatially pooled attributes for image retrieval. Neurocomputing

  17. Liu Z, Li H, Zhou W, Rui T, Tian Q (2015) Making residual vector distribution uniform for distinctive image representation. IEEE Trans Circ Syst Vid Technol 26(2):375–384

    Article  Google Scholar 

  18. Liu Z, Wang S, Tian Q (2016) Fine-residual VLAD for image retrieval. Neurocomputing 173:1183–1191

    Article  Google Scholar 

  19. Liu H, Zhao Q, Wang H, Lv P, Chen Y (2017) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tools Appl 76(22):24,435–24,456

    Article  Google Scholar 

  20. Liu Z, Wang S, Zheng L, Tian Q (2017) Robust imagegraph: rank-level feature fusion for image search. IEEE Trans Image Process 26(7):3128–3141

    Article  MathSciNet  Google Scholar 

  21. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  22. Mansourian L, Abdullah MT, Abdullah LN, Azman A, Mustaffa MR (2017) An effective fusion model for image retrieval. Multimed Tools Appl, 1–24

  23. Marszałek M, Schmid C (2012) Accurate object recognition with shape masks. Int J Comput Vis 97(2):191–209

    Article  MathSciNet  Google Scholar 

  24. Miller FP, Vandome AF, Mcbrewster J (2010) Region: Hessian affine region detector. Alphascript Publishing

  25. Murata M, Nagano H, Mukai R, Kashino K, Satoh S (2014) Bm25 with exponential idf for instance search. IEEE Trans Multimed 16(6):1690–1699. https://doi.org/10.1109/TMM.2014.2323945

    Article  Google Scholar 

  26. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2161–2168

  27. Perronnin F, Sánchez J., Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European conference on computer vision, pp 143–156

    Chapter  Google Scholar 

  28. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  29. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

  30. Radenović F., Tolias G, Chum O (2016) Cnn image retrieval learns from bow: unsupervised fine-tuning with hard examples. In: Proceedings of European conference on computer vision, pp 3–20

    Chapter  Google Scholar 

  31. Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258

    Article  Google Scholar 

  32. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, pp 1470–1477

  33. Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606

    Article  Google Scholar 

  34. Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    Article  MathSciNet  Google Scholar 

  35. Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713–1728

    Article  Google Scholar 

  36. Tao R, Smeulders AWM, Chang SF (2015) Attributes and categories for generic instance search from one example. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 177–186

  37. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3360–3367

  38. Wu Y, Liu H, Yuan J, Zhang Q (2017) Is visual saliency useful for content-based image retrieval? Multimed Tools Appl, 1–24

  39. Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008

    Article  MathSciNet  Google Scholar 

  40. Zhao WL, Ngo CW, Wang H (2016) Fast covariant VLAD for image search. IEEE Trans Multimed 18(9):1843–1854. https://doi.org/10.1109/TMM.2016.2585023

    Article  Google Scholar 

  41. Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition

  42. Zheng L, Wang S, Liu Z, Tian Q (2015) Fast image retrieval: query pruning and early termination. IEEE Trans Multimed 17(5):648–659

    Article  Google Scholar 

  43. Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of European conference on computer vision, pp 141–154

    Chapter  Google Scholar 

  44. Zhou Q, Wang C, Liu P, Li Q, Wang Y, Chen S (2016) Distribution entropy boosted VLAD for image retrieval. Entropy 18(8):311

    Article  Google Scholar 

Download references

Acknowledgements

This work was partly supported by the China Scholarship Council (201706035021), the National Natural Science Foundation of China (61175096), the German Research Foundation in project Crossmodal Learning (TRR-169) and Chinese Government Scholarship under China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Zhao, Q., Zhang, C. et al. Boosting VLAD with weighted fusion of local descriptors for image retrieval. Multimed Tools Appl 78, 11835–11855 (2019). https://doi.org/10.1007/s11042-018-6712-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6712-z

Keywords

Navigation