Abstract
In the last decade, many efforts have been developed for discriminative image representations. Among these works, vector of locally aggregated descriptors (VLAD) has been demonstrated to be an effective one. However, most VLAD-based methods generally employ detected SIFT descriptors and contain limited content information, in which the representation ability is deteriorated. In this work, we propose a novel framework to boost VLAD with weighted fusion of local descriptors (WF-VLAD), which encodes more discriminative clues and maintains higher performance. Toward a preferable image representation that contains sufficient details, our approach fuses SIFT sampled densely (dense SIFT) and detected from the interest points (detected SIFT) in the aggregation. Furthermore, we assign each detected SIFT corresponding weight that measured by saliency analysis to make the salient descriptors with relatively high importance. The proposed method can include sufficient image content information and highlight the important image regions. Finally, experiments on publicly available datasets demonstrate that our approach shows competitive performance in retrieval tasks.
Similar content being viewed by others
References
Achanta R, Hemami SS, Estrada FV, Susstrunk S (2009) Frequency-tuned salient region detection. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1597–1604
Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2911–2918
Arandjelovic R, Zisserman A (2013) All about VLAD. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Proceedings of European conference on computer vision. Springer, pp 584–599
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of British machine vision conference, pp 76.1– 76.12
Delhumeau J, Gosselin P, Jegou H, Perez P (2013) Revisiting the VLAD image representation. In: Proceedings of ACM international conference on multimedia, pp 653–656
Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: Proceedings of IEEE International conference on image processing. IEEE, pp 3087–3091
Jegou H, Chum O (2012) Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Proceedings of European conference on computer vision, pp 774–787
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of European conference on computer vision, pp 304–317
Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE conference on computer vision and patternn recognition, pp 3304–3311
Jegou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716
Kim TE, Kim MH (2015) Improving the search accuracy of the VLAD through weighted aggregation of local descriptors. J Vis Commun Image Represent 31:237–252
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2169–2178
Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 4758–4767
Li J, Xu C, Yang W, Sun C (2017) Spa: spatially pooled attributes for image retrieval. Neurocomputing
Liu Z, Li H, Zhou W, Rui T, Tian Q (2015) Making residual vector distribution uniform for distinctive image representation. IEEE Trans Circ Syst Vid Technol 26(2):375–384
Liu Z, Wang S, Tian Q (2016) Fine-residual VLAD for image retrieval. Neurocomputing 173:1183–1191
Liu H, Zhao Q, Wang H, Lv P, Chen Y (2017) An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed Tools Appl 76(22):24,435–24,456
Liu Z, Wang S, Zheng L, Tian Q (2017) Robust imagegraph: rank-level feature fusion for image search. IEEE Trans Image Process 26(7):3128–3141
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mansourian L, Abdullah MT, Abdullah LN, Azman A, Mustaffa MR (2017) An effective fusion model for image retrieval. Multimed Tools Appl, 1–24
Marszałek M, Schmid C (2012) Accurate object recognition with shape masks. Int J Comput Vis 97(2):191–209
Miller FP, Vandome AF, Mcbrewster J (2010) Region: Hessian affine region detector. Alphascript Publishing
Murata M, Nagano H, Mukai R, Kashino K, Satoh S (2014) Bm25 with exponential idf for instance search. IEEE Trans Multimed 16(6):1690–1699. https://doi.org/10.1109/TMM.2014.2323945
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on computer vision and pattern recognition, pp 2161–2168
Perronnin F, Sánchez J., Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European conference on computer vision, pp 143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Radenović F., Tolias G, Chum O (2016) Cnn image retrieval learns from bow: unsupervised fine-tuning with hard examples. In: Proceedings of European conference on computer vision, pp 3–20
Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision, pp 1470–1477
Sivic J, Zisserman A (2009) Efficient visual search of videos cast as text retrieval. IEEE Trans Pattern Anal Mach Intell 31(4):591–606
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Spyromitros-Xioufis E, Papadopoulos S, Kompatsiaris IY, Tsoumakas G, Vlahavas I (2014) A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans Multimed 16(6):1713–1728
Tao R, Smeulders AWM, Chang SF (2015) Attributes and categories for generic instance search from one example. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 177–186
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3360–3367
Wu Y, Liu H, Yuan J, Zhang Q (2017) Is visual saliency useful for content-based image retrieval? Multimed Tools Appl, 1–24
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process 23(5):1994–2008
Zhao WL, Ngo CW, Wang H (2016) Fast covariant VLAD for image search. IEEE Trans Multimed 18(9):1843–1854. https://doi.org/10.1109/TMM.2016.2585023
Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition
Zheng L, Wang S, Liu Z, Tian Q (2015) Fast image retrieval: query pruning and early termination. IEEE Trans Multimed 17(5):648–659
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: Proceedings of European conference on computer vision, pp 141–154
Zhou Q, Wang C, Liu P, Li Q, Wang Y, Chen S (2016) Distribution entropy boosted VLAD for image retrieval. Entropy 18(8):311
Acknowledgements
This work was partly supported by the China Scholarship Council (201706035021), the National Natural Science Foundation of China (61175096), the German Research Foundation in project Crossmodal Learning (TRR-169) and Chinese Government Scholarship under China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, H., Zhao, Q., Zhang, C. et al. Boosting VLAD with weighted fusion of local descriptors for image retrieval. Multimed Tools Appl 78, 11835–11855 (2019). https://doi.org/10.1007/s11042-018-6712-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6712-z