Image classification based on improved VLAD

Long, Xianzhong; Lu, Hongtao; Peng, Yong; Wang, Xianzhong; Feng, Shaokun

doi:10.1007/s11042-015-2524-6

Image classification based on improved VLAD

Published: 06 March 2015

Volume 75, pages 5533–5555, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xianzhong Long¹,
Hongtao Lu²,
Yong Peng²,
Xianzhong Wang² &
…
Shaokun Feng²

531 Accesses
10 Citations
Explore all metrics

Abstract

Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics

Weighted two-step aggregated VLAD for image retrieval

Article 29 June 2018

A Comprehensive Study on VLAD

Article 03 April 2021

References

Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1578–1585
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4):712–727
Article Google Scholar
Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2184–2191
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1, p. 22
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893
Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp 653–656
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12):3736–3745
Article MathSciNet Google Scholar
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70
Article Google Scholar
Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37
Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely–laplacian sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3555–3561
Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International Conference on Computer Vision, vol. 2, pp 1458–1465
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1617–1624
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304–3311
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9):1704–1716
Article Google Scholar
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision, vol. 1, pp 604–610
Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision, pp 1487–1494
Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1616
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 2169–2178
Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 524–531
Li LJ, Li FF (2007) What, where and who? classifying events by scene and object recognition. In: International Conference on Computer Vision, pp 1–8
Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl:1–18
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 397–404
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems 19
Morel J, Yu G (2009) Asift: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2):438–469
Article MathSciNet MATH Google Scholar
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE International Conference on Image Processing, pp 669–672
Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International Conference on Computer Vision, vol. 1, pp 883–890
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International Conference on Computer Vision
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477
Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3360–3367
Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11):1985–1997
Article Google Scholar
Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1794–1801
Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22:2223–2231
Google Scholar
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision, pp 141–154

Download references

Acknowledgments

This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No. 61300164, 61272247), Shanghai Science and Technology Committee (Grant No. 13511500200) and European Union Seventh Framework Programme (Grant No. 247619).

Author information

Authors and Affiliations

School of Computer Science & Technology, School of Software, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Xianzhong Long
Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Hongtao Lu, Yong Peng, Xianzhong Wang & Shaokun Feng

Authors

Xianzhong Long
View author publications
You can also search for this author in PubMed Google Scholar
Hongtao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xianzhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shaokun Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianzhong Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, X., Lu, H., Peng, Y. et al. Image classification based on improved VLAD. Multimed Tools Appl 75, 5533–5555 (2016). https://doi.org/10.1007/s11042-015-2524-6

Download citation

Received: 25 August 2014
Revised: 22 December 2014
Accepted: 18 February 2015
Published: 06 March 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11042-015-2524-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image classification based on improved VLAD

Abstract

Access this article

Similar content being viewed by others

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics

Weighted two-step aggregated VLAD for image retrieval

A Comprehensive Study on VLAD

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image classification based on improved VLAD

Abstract

Access this article

Similar content being viewed by others

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics

Weighted two-step aggregated VLAD for image retrieval

A Comprehensive Study on VLAD

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation