Abstract
Recently, a coding scheme called vector of locally aggregated descriptors (VLAD) has got tremendous successes in large scale image retrieval due to its efficiency of compact representation. VLAD employs only the nearest neighbor visual word in dictionary to aggregate each descriptor feature. It has fast retrieval speed and high retrieval accuracy under small dictionary size. In this paper, we give three improved VLAD variations for image classification: first, similar to the bag of words (BoW) model, we count the number of descriptors belonging to each cluster center and add it to VLAD; second, in order to expand the impact of residuals, squared residuals are taken into account; thirdly, in contrast with one nearest neighbor visual word, we try to look for two nearest neighbor visual words for aggregating each descriptor. Experimental results on UIUC Sports Event, Corel 10 and 15 Scenes datasets show that the proposed methods outperform some state-of-the-art coding schemes in terms of the classification accuracy and computation speed.
Similar content being viewed by others
References
Arandjelovic R, Zisserman A (2013) All about vlad. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1578–1585
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Bosch A, Zisserman A, Muoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Int 30(4):712–727
Cinbis RG, Verbeek J, Schmid C (2012) Image categorization using fisher kernels of non-iid image models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2184–2191
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Csurka G, Dance CR, Fan LX, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol. 1, p. 22
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893
Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the vlad image representation. In: ACM international conference on Multimedia, pp 653–656
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Proc 15(12):3736–3745
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comp Vision Image Underst 106(1):59–70
Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37
Gao SH, Tsang IWH, Chia LT, Zhao PL (2010) Local features are not lonely–laplacian sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3555–3561
Grauman K, Darrell T (2005) The pyramid match kernel: Discriminative classification with sets of image features. In: International Conference on Computer Vision, vol. 2, pp 1458–1465
Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset
Harada T, Ushiku Y, Yamashita Y, Kuniyoshi Y (2011) Discriminative spatial pyramid. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1617–1624
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3304–3311
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Int 34(9):1704–1716
Jurie F, Triggs B (2005) Creating efficient codebooks for visual recognition. In: International Conference on Computer Vision, vol. 1, pp 604–610
Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE International Conference on Computer Vision, pp 1487–1494
Kulkarni N, Li BX (2011) Discriminative affine sparse codes for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1609–1616
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 2169–2178
Li FF, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp 524–531
Li LJ, Li FF (2007) What, where and who? classifying events by scene and object recognition. In: International Conference on Computer Vision, pp 1–8
Long X, Lu H, Li W (2012) Image classification based on nearest neighbor basis vectors. Multimed Tools Appl:1–18
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu Z, Ip HHS (2009) Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 397–404
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. Advances in neural information processing systems 19
Morel J, Yu G (2009) Asift: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2(2):438–469
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision, pp 143–156
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Picard D, Gosselin PH (2011) Improving image similarity with vectors of locally aggregated tensors. In: IEEE International Conference on Image Processing, pp 669–672
Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T, Van Gool L (2005) Modeling scenes with local descriptors and latent aspects. In: International Conference on Computer Vision, vol. 1, pp 883–890
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: An efficient alternative to sift or surf. In: International Conference on Computer Vision
Sivic J, Zisserman A (2003) Video google: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision, pp 1470–1477
Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3360–3367
Xu D, Chang S (2008) Video event recognition using kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Int 30(11):1985–1997
Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1794–1801
Yang L, Jin R, Sukthankar R, Jurie F (2008) Unifying discriminative visual codebook generation with classifier training for object category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Yu K, Zhang T, Gong YH (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22:2223–2231
Zhou X, Yu K, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. In: European Conference on Computer Vision, pp 141–154
Acknowledgments
This work is sponsored by NUPTSF (Grant No. NY214168), National Natural Science Foundation of China (Grant No. 61300164, 61272247), Shanghai Science and Technology Committee (Grant No. 13511500200) and European Union Seventh Framework Programme (Grant No. 247619).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Long, X., Lu, H., Peng, Y. et al. Image classification based on improved VLAD. Multimed Tools Appl 75, 5533–5555 (2016). https://doi.org/10.1007/s11042-015-2524-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2524-6