Skip to main content
Log in

Image classification based on improved VLAD

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1578–1585

  2. Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  3. Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4):712–727

    Article  Google Scholar 

  4. Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2184–2191

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  6. Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1, p. 22

  7. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893

  8. Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp 653–656

  9. Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12):3736–3745

    Article  MathSciNet  Google Scholar 

  10. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70

    Article  Google Scholar 

  11. Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37

  12. Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely–laplacian sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3555–3561

  13. Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International Conference on Computer Vision, vol. 2, pp 1458–1465

  14. Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset

  15. Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1617–1624

  16. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304–3311

  17. Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9):1704–1716

    Article  Google Scholar 

  18. Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision, vol. 1, pp 604–610

  19. Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision, pp 1487–1494

  20. Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1616

  21. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 2169–2178

  22. Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 524–531

  23. Li LJ, Li FF (2007) What, where and who? classifying events by scene and object recognition. In: International Conference on Computer Vision, pp 1–8

  24. Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl:1–18

  25. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  26. Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 397–404

  27. Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems 19

  28. Morel J, Yu G (2009) Asift: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2):438–469

    Article  MathSciNet  MATH  Google Scholar 

  29. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  30. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156

  31. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  32. Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE International Conference on Image Processing, pp 669–672

  33. Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International Conference on Computer Vision, vol. 1, pp 883–890

  34. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International Conference on Computer Vision

  35. Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477

  36. Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3360–3367

  37. Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11):1985–1997

    Article  Google Scholar 

  38. Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1794–1801

  39. Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  40. Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22:2223–2231

    Google Scholar 

  41. Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision, pp 141–154

Download references

Acknowledgments

This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No. 61300164, 61272247), Shanghai Science and Technology Committee (Grant No. 13511500200) and European Union Seventh Framework Programme (Grant No. 247619).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianzhong Long.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Long, X., Lu, H., Peng, Y. et al. Image classification based on improved VLAD. Multimed Tools Appl 75, 5533–5555 (2016). https://doi.org/10.1007/s11042-015-2524-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2524-6

Keywords

Navigation