Abstract
The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.
Similar content being viewed by others
Abbreviations
- M :
-
Number of classes
- C :
-
Set of M scene classes
- V :
-
Set of M class-specific vocabularies
- V j :
-
Set of k visual words learned from training images of class j
- v i :
-
ith visual word
- u j :
-
jth visual word
- |V|:
-
Size of visual vocabulary
- h(d) :
-
Histogram of visual words for image d
- h i (d):
-
Number of descriptors in image d
- N d :
-
Total number of descriptors in image d
- L :
-
Number of levels on the spatial pyramid layout
- h l(d ri ):
-
Histogram vector of BOW for image d at level l and sub-region r i
- c l(d ri ):
-
Colour moments vector for image d at level l and sub-region r i
- m :
-
Number of images in the training image dataset
- T :
-
Real-valued threshold vector
- T l ri :
-
Average density of keypoints at level land image sub-region r i over m images
- H(d) :
-
Feature vector for image d results from concatenation of BOW and weighted pyramidal colour moments
- w :
-
Weight vector that indicates the importance of colour information
- K :
-
Kernel function
References
Rui Y., Huang T.S., Chang S.F.: Image retrieval: current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent. 10(1), 39–62 (1999)
Liu Y., Zhang D., Lu G., Ma W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007)
Datta R., Joshi D., Li J., Wang J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Vogel, J., Schwaninger, A., Wallraven, C., Bulthoff, H.: Categorization of natural scenes: local versus global information and the role of color. ACM Trans. Appl. Percept. 4(3), November 2007, Article 19 (2007)
Ross M.G., Oliva A.: Estimating perception of scene layout properties from global image features. J. Vis. 10(1), 1–25 (2010)
Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: Proceedings of IEEE International Conference on Computer Vision ICCV, Beijing, China, 17–21 Octo 2005, pp. 883–890 (2005)
Perina A., Cristani M., Murino V.: Learning natural scene categories by selective multi-scale feature extraction. Image Vis. Comput. 28(6), 927–939 (2010)
Vogel J., Schiele B.: A semantic typicality measure for natural scene categorization. Lect. Notes Comput. Sci. 3175, 195–203 (2004)
Bosch A., Munoz X., Marti R.: Which is the best way to organize/classify images by content?. Image Vis. Comput. 25(6), 778–791 (2007)
Quelhas P., Monay F., Odobez J.M., Gatica-Perez D., Tuytelaars T.: A thousand words in a scene. in: IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1575–1589 (2007)
Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Minneapolis, Minnesota, USA, 18–23 June 2007, pp. 1–8 (2007)
Lowe D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Quelhas, P., Odobez, J.: Natural scene image modeling using color and texture visterms. In: Proceedings of International Conference on Image and Video Retrieval, CIVR, Lecture Notes in Computer Science, Tempe, AZ, USA, 13–15 July 2006, pp. 411–421 (2006)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, Czech Republic, 11–14 May 2004, pp. 59–74 (2004)
Quelhas, P., Odobez, J.: Multi-level local descriptor quantization for bag-of-visterms image representation. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9–11 July 2007, pp. 242–249 (2007)
Wu, Z., Ke, Q., Sun, J., Shum, H.Y.: A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 1992–1999 (2009)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR, New York, USA, 17–22 June 2006, pp. 2161–2168 (2006)
Perronnin F.: Universal and adapted vocabularies for generic visual categorization. in: IEEE Trans. Patt. Anal. Mach. Intell. 30(7), 1243–1256 (2008)
Wu, J., Rehg, J.: Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 630–637 (2009)
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 494–501 (2007)
Alqasrawi, Y., Neagu, D., Cowling, P.: Natural scene image recognition by fusing weighted colour moments with bag of visual patches on spatial pyramid layout. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, ISDA, IEEE Computer Society, Pisa, Italy, 30 Nov—2 Dec 2009, pp. 140–145 (2009)
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the 9th ACM International Workshop on Multimedia Information Retrieval, ACM MIR, University of Augsburg, Germany, 28–29 Sept 2007, pp. 197–206 (2007)
Khan, F., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 979–986 (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognising natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, New York, USA, 17–22 June 2006, pp. 2169–2178 (2006)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 401–408 (2007)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, Alaska, USA, 24–26 June 2008, pp. 1–8 (2008)
Battiato, S., Farinella, G., Gallo, G., Ravi, D.: Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J. Image Video Process. Special Issue Multimed. Model., January 2010, pp. 1–13 (2010)
Wang J.Z., Li J., Wiederhold G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. in: IEEE Trans. Patt. Anal. Mach. Intell. 23(9), 947–963 (2001)
Vailaya A., Figueiredo M.A.T., Jain A.K., Zhang H.J.: Image classification for content-based indexing. in: IEEE Trans. Image Process. 10(1), 117–130 (2001)
Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Database, Bombay, India, January 1998, pp. 42–51 (1998)
Oliva A., Torralba A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Swain M., Ballard D.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–26 June 2005, pp. 524–531 (2005)
Bosch, A., Munoz, X., Oliver, A., Marti, R.: Object and scene classification: what does a supervised approach provide us. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, ICPR, Hong Kong, China, 20–24 Aug 2006, pp. 773–777 (2006)
Farinella, G., Battiato, S.: Representation models and machine learning techniques for scene classification. In: Wang, P.S.P. (ed.) Chapter in Pattern Recognition, Machine Vision, Principles and Applications, Chap. 13, pp. 199–214. River publisher, Denmark (2010)
Zhu L., Zhang A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision, ICCV, Nice, France, 14–17 Octo 2003, pp. 1470–1477 (2003)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Proceedings of 10th IEEE International Conference on Computer Vision, ICCV, Beijing, China, 17–20 Octo 2005, pp. 604–610 (2005)
Nilsback, M., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of IEEE Conference on Computer Vision, CVPR, New York, USA, 17–22 June 2006, pp. 1447–1454 (2006)
van Gemert J.C., Snoek C.G.M., Veenman C.J., Smeulders A.W.M., Geusebroek J.M.: Comparing compact codebooks for visual categorization. Comput. Vis. Image Underst. 114(4), 450–462 (2010)
Battiato, S., Farinella, G.M., Guarnera, G.C., Meccio, T., Puglisi, G., Ravi, D., Rizzo, R.: Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the 2nd ACM Workshop on Multimedia in Forensics, Security and Intelligence, Firenze, Italy, 25–29 Octo 2010, pp. 65–70 (2010)
Jiang Y.G., Yang J., Ngo C.W., Hauptmann A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. in: IEEE Trans. Multimed. 12(1), 42–53 (2010)
Mikolajczyk K., Schmid C.: A performance evaluation of local descriptors. in: IEEE Trans Pattern Anal. Machine Intell. 27, 1615–1630 (2005)
Odone F., Barla A., Verri A.: Building kernels from binary strings for image matching. in: IEEE Trans. Image Process. 14(2), 169–180 (2005)
Chang, C.-C., Ling, C.-J., LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/, 2001
van de Sande K.E.A., Gevers T., Snoek C.G.M.: Evaluating color descriptors for object and scene recognition. in: IEEE Trans Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)
Guldogan E., Gabbouj M.: Feature selection for content-based image retrieval. J. Signal Image Video Process 2(3), 241–250 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alqasrawi, Y., Neagu, D. & Cowling, P.I. Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification. SIViP 7, 759–775 (2013). https://doi.org/10.1007/s11760-011-0266-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-011-0266-0