Abstract
Bag-of-visual-words is a popular image representation and attains wide application in image processing community. While its potential has been explored in many aspects, its operation still follows a basic mode, namely for a given dataset, using k-means-like clustering methods to train a vocabulary. The vocabulary obtained this way is data dependent, i.e., with a new dataset, we must train a new vocabulary. Based on previous research on determining the optimal vocabulary size, in this paper we research on the possibility of building a universal and limited visual vocabulary with optimal performance. We analyze why such a vocabulary should exist and conduct extensive experiments on three challenging datasets to validate this hypothesis. As a consequence, we believe this work sheds a new light on finally obtaining a universal visual vocabulary of limited size which can be used with any datasets to obtain the best or near-best performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)
Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision, vol. 2, pp. 1458–1465 (2005)
Lazebnik, S., Schmid, C., Ponce, J.: A maximum entropy framework for part-based texture and object recognition. In: IEEE International Conference on Computer Vision, pp. 832–838 (2005)
Yang, J., Jiang, Y., Hauptmann, A., Ngo, C.: Evaluating bag-of-visual-words representations in scene classification. In: International Workshop on Multimedia Information Retrieval, pp. 197–206 (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)
Marszalek, M., Schmid, C.: Spatial weighting for bag-of-features. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2118–2125 (2006)
Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: ACM International Conference on Image and Video Retrieval (2009)
Cai, H., Yan, F., Mikolajczyk, K.: Learning weights for codebook in image classification and retrieval. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2320–2327 (2010)
Nister, D., Stewenius, H.: Scale recognition with a vocabulary tree. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)
Li, T., Mei, T., Kweon, I.S.: Learning optimal compact codebook for efficient object categorization. In: IEEE 2008 Workshop on Applications of Computer Vision, pp. 1–6 (2008)
Mallapragada, P., Jin, R., Jain, A.: Online visual vocabulary pruning using pairwise constraints. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 3073–3080 (2010)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: An in-depth study. Technical report, INRIA (2003)
Zhao, W., Jiang, Y., Ngo, C.: Keyframe retrieval by keypoints: Can point-to-point matching help? In: ACM International Conference on Image and Video Retrieval, pp. 72–81 (2006)
Deselaers, T., Pimenidis, L., Ney, H.: Bag-of-visual-words models for adult image lassification and filtering. In: International Conference on Pattern Recognition, pp. 1–4 (2008)
Hou, J., Kang, J., Qi, N.M.: On vocabulary size in bag-of-visual-words representation. In: The 2010 Pacific-Rim Conference on Multimedia, pp. 414–424 (2010)
Ries, C.X., Romberg, S., Lienhart, R.: Towards universal visual vocabularies. In: International Conference on Multimedia and Expo., pp. 1067–1072 (2010)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision, p. 178 (2004)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145–175 (2001)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 524–531 (2005)
Jia, L.L., Fei-Fei, L.: What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: IEEE International Conference on Computer Vision, pp. 1447–1454 (2006)
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: Nus-wide: A real-world web image database from national university of singapore. In: ACM International Conference on Image and video retrieval, pp. 1–9 (2009)
Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hou, J., Feng, ZS., Yang, Y., Qi, NM. (2011). Towards a Universal and Limited Visual Vocabulary. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2011. Lecture Notes in Computer Science, vol 6939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24031-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-24031-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24030-0
Online ISBN: 978-3-642-24031-7
eBook Packages: Computer ScienceComputer Science (R0)