Towards a Universal and Limited Visual Vocabulary

Hou, Jian; Feng, Zhan-Shen; Yang, Yong; Qi, Nai-Ming

doi:10.1007/978-3-642-24031-7_40

Towards a Universal and Limited Visual Vocabulary

Jian Hou²⁸,
Zhan-Shen Feng²⁸,
Yong Yang²⁹ &
…
Nai-Ming Qi²⁹

Conference paper

2710 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6939))

Abstract

Bag-of-visual-words is a popular image representation and attains wide application in image processing community. While its potential has been explored in many aspects, its operation still follows a basic mode, namely for a given dataset, using k-means-like clustering methods to train a vocabulary. The vocabulary obtained this way is data dependent, i.e., with a new dataset, we must train a new vocabulary. Based on previous research on determining the optimal vocabulary size, in this paper we research on the possibility of building a universal and limited visual vocabulary with optimal performance. We analyze why such a vocabulary should exist and conduct extensive experiments on three challenging datasets to validate this hypothesis. As a consequence, we believe this work sheds a new light on finally obtaining a universal visual vocabulary of limited size which can be used with any datasets to obtain the best or near-best performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)
Google Scholar
Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465 (2005)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: IEEE International Conference on Computer Vision, pp. 832–838 (2005)
Google Scholar
Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.: Evaluating bag-of-visual-words representations in scene classification. In: International Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)
Google Scholar
Marszalek, M., Schmid, C.: Spatial weighting for bag-of-features. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2118–2125 (2006)
Google Scholar
Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM International Conference on Image and Video Retrieval (2009)
Google Scholar
Cai, H., Yan, F., Mikolajczyk, K.: Learning weights for codebook in image classification and retrieval. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2320–2327 (2010)
Google Scholar
Nister, D., Stewenius, H.: Scale recognition with a vocabulary tree. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)
Google Scholar
Li, T., Mei, T., Kweon, I.S.: Learning optimal compact codebook for efficient object categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6 (2008)
Google Scholar
Mallapragada, P., Jin, R., Jain, A.: Online visual vocabulary pruning using pairwise constraints. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 3073–3080 (2010)
Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: An in-depth study. Technical report, INRIA (2003)
Google Scholar
Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point matching help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81 (2006)
Google Scholar
Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-visual-words models for adult image lassification and filtering. In: International Conference on Pattern Recognition, pp. 1–4 (2008)
Google Scholar
Hou, J., Kang, J., Qi, N.M.: On vocabulary size in bag-of-visual-words representation. In: The 2010 Pacific-Rim Conference on Multimedia, pp. 414–424 (2010)
Google Scholar
Ries, C.X., Romberg, S., Lienhart, R.: Towards universal visual vocabularies. In: International Conference on Multimedia and Expo., pp. 1067–1072 (2010)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision, p. 178 (2004)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)
Article MATH Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 524–531 (2005)
Google Scholar
Jia, L.L., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Article Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)
Google Scholar
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: IEEE International Conference on Computer Vision, pp. 1447–1454 (2006)
Google Scholar
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and video retrieval, pp. 1–9 (2009)
Google Scholar
Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xuchang University, China, 461000
Jian Hou & Zhan-Shen Feng
School of Astronautics, Harbin Institute of Technology, Harbin, China, 150001
Yong Yang & Nai-Ming Qi

Authors

Jian Hou
View author publications
You can also search for this author in PubMed Google Scholar
Zhan-Shen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Nai-Ming Qi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nevada, 89557, Reno, NV, USA
George Bebis
NASA Ames Research Center, 94035, Moffett Field, CA, USA
Richard Boyle
Lawrence Berkeley National Laboratory, 94720, Berkeley, CA, USA
Bahram Parvin
Desert Research Institute, 89512, Reno, NV, USA
Darko Koracin
Department of Computer Science and Engineering, University of South Carolina, 29208, Columbia, SC, USA
Song Wang
HRL Laboratories, 3011 Malibu Canyon Road, 90265-4797, Malibu, CA, USA
Kim Kyungnam
Purdue University, West Lafayette, 47907-2021, IN, USA
Bedrich Benes
Sandia National Laboratory, 87185, Albuquerque, NM, USA
Kenneth Moreland
University of Louisiana at Lafayette, 70504, LA, USA
Christoph Borst
Adobe Systems Incorporated, San Francisco, CA, USA
Stephen DiVerdi
Polytechnic Institute of NYU, 11201, Brooklyn, NY, USA
Chiang Yi-Jen
Lawrence Livermore National Laboratory, 94551-0808, Livermore, CA, USA
Jiang Ming

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hou, J., Feng, ZS., Yang, Y., Qi, NM. (2011). Towards a Universal and Limited Visual Vocabulary. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2011. Lecture Notes in Computer Science, vol 6939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24031-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-24031-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24030-0
Online ISBN: 978-3-642-24031-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics