Skip to main content
Log in

Weighted two-step aggregated VLAD for image retrieval

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The vector of locally aggregated descriptor (VLAD) has been demonstrated to be efficient and effective in image retrieval and classification tasks. Due to the small-size codebook adopted by the method, the feature space division is coarse and the discriminative power is limited. Toward a discriminative and compact image representation for visual search, we develop a novel aggregating method to build VLAD, called two-step aggregated VLAD. Firstly, we propose the bidirectional quantization from both views of descriptors and visual words, for getting finer division of feature space. Secondly, we impose the probabilistic inverse document frequency to weight the local descriptors, for highlighting the discriminative ones. Experimental results on extensive datasets show that our method yields significant improvement and is competitive with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307. IEEE (2016)

  2. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2911–2918. IEEE (2012)

  3. Arandjelovic, R., Zisserman, A.: All about VLAD. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1578–1585. IEEE (2013)

  4. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Proceedings of European Conference on Computer Vision, pp. 404–417 (2006)

    Chapter  Google Scholar 

  5. Cai, H., Wang, X., Wang, Y.: Compact and robust fisher descriptors for large-scale image retrieval. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6. IEEE (2011)

  6. Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vectors for on-device image matching. In: Signals, Systems and Computers, pp. 850–854. IEEE (2011)

  7. Chisholm, E., Kolda, T.G.: New term weighting formulas for the vector space method in information retrieval. Computer Science and Mathematics Division, Oak Ridge National Laboratory (1999)

  8. Cho, J., Heo, J.P., Kim, T., Han, B., Yoon, S.E.: Rank-based voting with inclusion relationship for accurate image search. Vis. Comput. 33, 10479–1059 (2017)

    Article  Google Scholar 

  9. Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3416–3423. IEEE (2010)

  10. Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the vlad image representation. In: Proceedings of ACM international conference on Multimedia, pp. 653–656. ACM (2013)

  11. Dong, W., Wang, Z., Charikar, M., Li, K.: High-confidence near-duplicate image detection. In: Proceedings of ACM International Conference on Multimedia Retrieval, p. 1. ACM (2012)

  12. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge 2012 (VOC2012) results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

  13. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Proceedings of European Conference on Computer Vision, pp. 392–407 (2014)

    Chapter  Google Scholar 

  14. Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: the benefit of pca and whitening. In: Proceedings of European Conference on Computer Vision, pp. 774–787 (2012)

    Chapter  Google Scholar 

  15. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Proceedings of European Conference on Computer Vision, pp. 304–317 (2008)

    Google Scholar 

  16. Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1169–1176. IEEE (2009)

  17. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (2010)

  18. Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012)

    Article  Google Scholar 

  19. Kim, T.E., Kim, M.H.: Improving the search accuracy of the vlad through weighted aggregation of local descriptors. J. Vis. Commun. Image Represent. 31, 237–252 (2015)

    Article  Google Scholar 

  20. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)

  21. Li, H., Toyoura, M., Shimizu, K., Yang, W., Mao, X.: Retrieval of clothing images based on relevance feedback with focus on collar designs. The Visual Computer 32(10), 1–13 (2016)

    Article  Google Scholar 

  22. Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Vis. Comput. 31(10), 1383–1394 (2015)

    Article  Google Scholar 

  23. Liu, H., Zhao, Q., Wang, H., Lv, P., Chen, Y.: An image-based near-duplicate video retrieval and localization using improved edit distance. Multimed. Tools Appl. 76(22), 24435–24456 (2017)

    Article  Google Scholar 

  24. Liu, Z., Li, H., Zhou, W., Rui, T., Tian, Q.: Making residual vector distribution uniform for distinctive image representation. IEEE Trans. Circuits Syst. Video Technol. 26(2), 375–384 (2016)

    Article  Google Scholar 

  25. Liu, Z., Wang, S., Tian, Q.: Fine-residual vlad for image retrieval. Neurocomputing 173, 1183–1191 (2016)

    Article  Google Scholar 

  26. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  27. Ng, Y.H., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 53–61. IEEE (2015). https://doi.org/10.1109/CVPRW.2015.7301272

  28. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2161–2168. IEEE (2006)

  29. Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In: Proceedings of IEEE International Conference on Computer Vision, pp. 91–99. IEEE (2015)

  30. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3384–3391. IEEE (2010)

  31. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Proceedings of European Conference on Computer Vision, pp. 143–156. Springer (2010)

  32. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

  33. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  34. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, p. 1470. IEEE (2003)

  35. Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I.Y., Tsoumakas, G., Vlahavas, I.: A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE Trans. Multimed. 16(6), 1713–1728 (2014)

    Article  Google Scholar 

  36. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of ACM international conference on Multimedia, pp. 1469–1472. ACM (2010)

  37. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3367. IEEE (2010)

  38. Zhao, W.L., Ngo, C.W., Wang, H.: Fast covariant vlad for image search. IEEE Trans. Multimed. 18(9), 1843–1854 (2016)

    Article  Google Scholar 

  39. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946. IEEE (2014)

  40. Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1–1 (2017). https://doi.org/10.1109/tpami.2017.2709749

    Article  Google Scholar 

  41. Zhou, Q., Wang, C., Liu, P., Li, Q., Wang, Y., Chen, S.: Distribution entropy boosted vlad for image retrieval. Entropy 18(8), 311 (2016)

    Article  Google Scholar 

  42. Zhou, W., Li, H., Tian, Q.: Recent advance in content-based image retrieval: a literature survey. CoRR arxiv:1706.06064 (2017)

Download references

Funding

This work was partly supported by the China Scholarship Council (201706035021), the National Natural Science Foundation of China (61175096), the German Research Foundation in Project Crossmodal Learning (TRR-169) and Chinese Government Scholarship under China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Zhao, Q., Mbelwa, J.T. et al. Weighted two-step aggregated VLAD for image retrieval. Vis Comput 35, 1783–1795 (2019). https://doi.org/10.1007/s00371-018-1573-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-1573-z

Keywords

Navigation