Skip to main content

Towards a Universal and Limited Visual Vocabulary

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6939))

Abstract

Bag-of-visual-words is a popular image representation and attains wide application in image processing community. While its potential has been explored in many aspects, its operation still follows a basic mode, namely for a given dataset, using k-means-like clustering methods to train a vocabulary. The vocabulary obtained this way is data dependent, i.e., with a new dataset, we must train a new vocabulary. Based on previous research on determining the optimal vocabulary size, in this paper we research on the possibility of building a universal and limited visual vocabulary with optimal performance. We analyze why such a vocabulary should exist and conduct extensive experiments on three challenging datasets to validate this hypothesis. As a consequence, we believe this work sheds a new light on finally obtaining a universal visual vocabulary of limited size which can be used with any datasets to obtain the best or near-best performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)

    Google Scholar 

  2. Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465 (2005)

    Google Scholar 

  3. Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: IEEE International Conference on Computer Vision, pp. 832–838 (2005)

    Google Scholar 

  4. Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.: Evaluating bag-of-visual-words representations in scene classification. In: International Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)

    Google Scholar 

  5. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)

    Google Scholar 

  6. Marszalek, M., Schmid, C.: Spatial weighting for bag-of-features. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2118–2125 (2006)

    Google Scholar 

  7. Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM International Conference on Image and Video Retrieval (2009)

    Google Scholar 

  8. Cai, H., Yan, F., Mikolajczyk, K.: Learning weights for codebook in image classification and retrieval. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2320–2327 (2010)

    Google Scholar 

  9. Nister, D., Stewenius, H.: Scale recognition with a vocabulary tree. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)

    Google Scholar 

  10. Li, T., Mei, T., Kweon, I.S.: Learning optimal compact codebook for efficient object categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6 (2008)

    Google Scholar 

  11. Mallapragada, P., Jin, R., Jain, A.: Online visual vocabulary pruning using pairwise constraints. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 3073–3080 (2010)

    Google Scholar 

  12. Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: An in-depth study. Technical report, INRIA (2003)

    Google Scholar 

  13. Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point matching help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81 (2006)

    Google Scholar 

  14. Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-visual-words models for adult image lassification and filtering. In: International Conference on Pattern Recognition, pp. 1–4 (2008)

    Google Scholar 

  15. Hou, J., Kang, J., Qi, N.M.: On vocabulary size in bag-of-visual-words representation. In: The 2010 Pacific-Rim Conference on Multimedia, pp. 414–424 (2010)

    Google Scholar 

  16. Ries, C.X., Romberg, S., Lienhart, R.: Towards universal visual vocabularies. In: International Conference on Multimedia and Expo., pp. 1067–1072 (2010)

    Google Scholar 

  17. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision, p. 178 (2004)

    Google Scholar 

  18. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)

    Article  MATH  Google Scholar 

  19. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 524–531 (2005)

    Google Scholar 

  20. Jia, L.L., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)

    Article  Google Scholar 

  22. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)

    Google Scholar 

  23. Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: IEEE International Conference on Computer Vision, pp. 1447–1454 (2006)

    Google Scholar 

  24. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and video retrieval, pp. 1–9 (2009)

    Google Scholar 

  25. Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hou, J., Feng, ZS., Yang, Y., Qi, NM. (2011). Towards a Universal and Limited Visual Vocabulary. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2011. Lecture Notes in Computer Science, vol 6939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24031-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24031-7_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24030-0

  • Online ISBN: 978-3-642-24031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics