Weighted two-step aggregated VLAD for image retrieval

Liu, Hao; Zhao, Qingjie; Mbelwa, Jimmy T.; Tang, Song; Zhang, Jianwei

doi:10.1007/s00371-018-1573-z

Weighted two-step aggregated VLAD for image retrieval

Original Article
Published: 29 June 2018

Volume 35, pages 1783–1795, (2019)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Hao Liu¹,
Qingjie Zhao^1,2,
Jimmy T. Mbelwa¹,
Song Tang² &
…
Jianwei Zhang²

294 Accesses
3 Citations
Explore all metrics

Abstract

The vector of locally aggregated descriptor (VLAD) has been demonstrated to be efficient and effective in image retrieval and classification tasks. Due to the small-size codebook adopted by the method, the feature space division is coarse and the discriminative power is limited. Toward a discriminative and compact image representation for visual search, we develop a novel aggregating method to build VLAD, called two-step aggregated VLAD. Firstly, we propose the bidirectional quantization from both views of descriptors and visual words, for getting finer division of feature space. Secondly, we impose the probabilistic inverse document frequency to weight the local descriptors, for highlighting the discriminative ones. Experimental results on extensive datasets show that our method yields significant improvement and is competitive with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image classification based on improved VLAD

Article 06 March 2015

Encoding High-Order Statistics in VLAD for Scalable Image Retrieval

Boosting VLAD with weighted fusion of local descriptors for image retrieval

Article 06 October 2018

References

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307. IEEE (2016)
Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918. IEEE (2012)
Arandjelovic, R., Zisserman, A.: All about VLAD. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1578–1585. IEEE (2013)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Proceedings of European Conference on Computer Vision, pp. 404–417 (2006)
Chapter Google Scholar
Cai, H., Wang, X., Wang, Y.: Compact and robust fisher descriptors for large-scale image retrieval. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. IEEE (2011)
Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vectors for on-device image matching. In: Signals, Systems and Computers, pp. 850–854. IEEE (2011)
Chisholm, E., Kolda, T.G.: New term weighting formulas for the vector space method in information retrieval. Computer Science and Mathematics Division, Oak Ridge National Laboratory (1999)
Cho, J., Heo, J.P., Kim, T., Han, B., Yoon, S.E.: Rank-based voting with inclusion relationship for accurate image search. Vis. Comput. 33, 10479–1059 (2017)
Article Google Scholar
Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3416–3423. IEEE (2010)
Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the vlad image representation. In: Proceedings of ACM international conference on Multimedia, pp. 653–656. ACM (2013)
Dong, W., Wang, Z., Charikar, M., Li, K.: High-confidence near-duplicate image detection. In: Proceedings of ACM International Conference on Multimedia Retrieval, p. 1. ACM (2012)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (VOC2012) results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European Conference on Computer Vision, pp. 392–407 (2014)
Chapter Google Scholar
Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Proceedings of European Conference on Computer Vision, pp. 774–787 (2012)
Chapter Google Scholar
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of European Conference on Computer Vision, pp. 304–317 (2008)
Google Scholar
Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1169–1176. IEEE (2009)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)
Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012)
Article Google Scholar
Kim, T.E., Kim, M.H.: Improving the search accuracy of the vlad through weighted aggregation of local descriptors. J. Vis. Commun. Image Represent. 31, 237–252 (2015)
Article Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)
Li, H., Toyoura, M., Shimizu, K., Yang, W., Mao, X.: Retrieval of clothing images based on relevance feedback with focus on collar designs. The Visual Computer 32(10), 1–13 (2016)
Article Google Scholar
Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Vis. Comput. 31(10), 1383–1394 (2015)
Article Google Scholar
Liu, H., Zhao, Q., Wang, H., Lv, P., Chen, Y.: An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed. Tools Appl. 76(22), 24435–24456 (2017)
Article Google Scholar
Liu, Z., Li, H., Zhou, W., Rui, T., Tian, Q.: Making residual vector distribution uniform for distinctive image representation. IEEE Trans. Circuits Syst. Video Technol. 26(2), 375–384 (2016)
Article Google Scholar
Liu, Z., Wang, S., Tian, Q.: Fine-residual vlad for image retrieval. Neurocomputing 173, 1183–1191 (2016)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Ng, Y.H., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 53–61. IEEE (2015). https://doi.org/10.1109/CVPRW.2015.7301272
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168. IEEE (2006)
Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In: Proceedings of IEEE International Conference on Computer Vision, pp. 91–99. IEEE (2015)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391. IEEE (2010)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp. 143–156. Springer (2010)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, p. 1470. IEEE (2003)
Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans. Multimed. 16(6), 1713–1728 (2014)
Article Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of ACM international conference on Multimedia, pp. 1469–1472. ACM (2010)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3367. IEEE (2010)
Zhao, W.L., Ngo, C.W., Wang, H.: Fast covariant vlad for image search. IEEE Trans. Multimed. 18(9), 1843–1854 (2016)
Article Google Scholar
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946. IEEE (2014)
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1–1 (2017). https://doi.org/10.1109/tpami.2017.2709749
Article Google Scholar
Zhou, Q., Wang, C., Liu, P., Li, Q., Wang, Y., Chen, S.: Distribution entropy boosted vlad for image retrieval. Entropy 18(8), 311 (2016)
Article Google Scholar
Zhou, W., Li, H., Tian, Q.: Recent advance in content-based image retrieval: a literature survey. CoRR arxiv:1706.06064 (2017)

Download references

Funding

This work was partly supported by the China Scholarship Council (201706035021), the National Natural Science Foundation of China (61175096), the German Research Foundation in Project Crossmodal Learning (TRR-169) and Chinese Government Scholarship under China Scholarship Council.

Author information

Authors and Affiliations

Beijing Laboratory of Intelligence Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, China
Hao Liu, Qingjie Zhao & Jimmy T. Mbelwa
Department of Informatics, University of Hamburg, Hamburg, Germany
Qingjie Zhao, Song Tang & Jianwei Zhang

Authors

Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qingjie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy T. Mbelwa
View author publications
You can also search for this author in PubMed Google Scholar
Song Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Zhao, Q., Mbelwa, J.T. et al. Weighted two-step aggregated VLAD for image retrieval. Vis Comput 35, 1783–1795 (2019). https://doi.org/10.1007/s00371-018-1573-z

Download citation

Published: 29 June 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00371-018-1573-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weighted two-step aggregated VLAD for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image classification based on improved VLAD

Encoding High-Order Statistics in VLAD for Scalable Image Retrieval

Boosting VLAD with weighted fusion of local descriptors for image retrieval

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Weighted two-step aggregated VLAD for image retrieval

Abstract

Access this article

Similar content being viewed by others

Image classification based on improved VLAD

Encoding High-Order Statistics in VLAD for Scalable Image Retrieval

Boosting VLAD with weighted fusion of local descriptors for image retrieval

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation