Skip to main content
Log in

Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

M :

Number of classes

C :

Set of M scene classes

V :

Set of M class-specific vocabularies

V j :

Set of k visual words learned from training images of class j

v i :

ith visual word

u j :

jth visual word

|V|:

Size of visual vocabulary

h(d) :

Histogram of visual words for image d

h i (d):

Number of descriptors in image d

N d :

Total number of descriptors in image d

L :

Number of levels on the spatial pyramid layout

h l(d ri ):

Histogram vector of BOW for image d at level l and sub-region r i

c l(d ri ):

Colour moments vector for image d at level l and sub-region r i

m :

Number of images in the training image dataset

T :

Real-valued threshold vector

T l ri :

Average density of keypoints at level land image sub-region r i over m images

H(d) :

Feature vector for image d results from concatenation of BOW and weighted pyramidal colour moments

w :

Weight vector that indicates the importance of colour information

K :

Kernel function

References

  1. Rui Y., Huang T.S., Chang S.F.: Image retrieval: current techniques, promising directions, and open issues. J. Vis. Commun. Image Represent. 10(1), 39–62 (1999)

    Article  Google Scholar 

  2. Liu Y., Zhang D., Lu G., Ma W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recognit. 40(1), 262–282 (2007)

    Article  MATH  Google Scholar 

  3. Datta R., Joshi D., Li J., Wang J.Z.: Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)

    Article  Google Scholar 

  4. Vogel, J., Schwaninger, A., Wallraven, C., Bulthoff, H.: Categorization of natural scenes: local versus global information and the role of color. ACM Trans. Appl. Percept. 4(3), November 2007, Article 19 (2007)

    Google Scholar 

  5. Ross M.G., Oliva A.: Estimating perception of scene layout properties from global image features. J. Vis. 10(1), 1–25 (2010)

    Article  Google Scholar 

  6. Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: Proceedings of IEEE International Conference on Computer Vision ICCV, Beijing, China, 17–21 Octo 2005, pp. 883–890 (2005)

  7. Perina A., Cristani M., Murino V.: Learning natural scene categories by selective multi-scale feature extraction. Image Vis. Comput. 28(6), 927–939 (2010)

    Article  Google Scholar 

  8. Vogel J., Schiele B.: A semantic typicality measure for natural scene categorization. Lect. Notes Comput. Sci. 3175, 195–203 (2004)

    Article  Google Scholar 

  9. Bosch A., Munoz X., Marti R.: Which is the best way to organize/classify images by content?. Image Vis. Comput. 25(6), 778–791 (2007)

    Article  Google Scholar 

  10. Quelhas P., Monay F., Odobez J.M., Gatica-Perez D., Tuytelaars T.: A thousand words in a scene. in: IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1575–1589 (2007)

    Article  Google Scholar 

  11. Gokalp, D., Aksoy, S.: Scene classification using bag-of-regions representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Minneapolis, Minnesota, USA, 18–23 June 2007, pp. 1–8 (2007)

  12. Lowe D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  13. Quelhas, P., Odobez, J.: Natural scene image modeling using color and texture visterms. In: Proceedings of International Conference on Image and Video Retrieval, CIVR, Lecture Notes in Computer Science, Tempe, AZ, USA, 13–15 July 2006, pp. 411–421 (2006)

  14. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, Czech Republic, 11–14 May 2004, pp. 59–74 (2004)

  15. Quelhas, P., Odobez, J.: Multi-level local descriptor quantization for bag-of-visterms image representation. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9–11 July 2007, pp. 242–249 (2007)

  16. Wu, Z., Ke, Q., Sun, J., Shum, H.Y.: A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 1992–1999 (2009)

  17. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition CVPR, New York, USA, 17–22 June 2006, pp. 2161–2168 (2006)

  18. Perronnin F.: Universal and adapted vocabularies for generic visual categorization. in: IEEE Trans. Patt. Anal. Mach. Intell. 30(7), 1243–1256 (2008)

    Article  Google Scholar 

  19. Wu, J., Rehg, J.: Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 630–637 (2009)

  20. Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 494–501 (2007)

  21. Alqasrawi, Y., Neagu, D., Cowling, P.: Natural scene image recognition by fusing weighted colour moments with bag of visual patches on spatial pyramid layout. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, ISDA, IEEE Computer Society, Pisa, Italy, 30 Nov—2 Dec 2009, pp. 140–145 (2009)

  22. Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the 9th ACM International Workshop on Multimedia Information Retrieval, ACM MIR, University of Augsburg, Germany, 28–29 Sept 2007, pp. 197–206 (2007)

  23. Khan, F., van de Weijer, J., Vanrell, M.: Top-down color attention for object recognition. In: Proceedings of 12th IEEE International Conference on Computer Vision, ICCV, Kyoto, Japan, 27 Sept—4 Octo 2009, pp. 979–986 (2009)

  24. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognising natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, New York, USA, 17–22 June 2006, pp. 2169–2178 (2006)

  25. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, Amsterdam, The Netherlands, 9–11 July 2007, pp. 401–408 (2007)

  26. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Anchorage, Alaska, USA, 24–26 June 2008, pp. 1–8 (2008)

  27. Battiato, S., Farinella, G., Gallo, G., Ravi, D.: Exploiting textons distributions on spatial hierarchy for scene classification. EURASIP J. Image Video Process. Special Issue Multimed. Model., January 2010, pp. 1–13 (2010)

  28. Wang J.Z., Li J., Wiederhold G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. in: IEEE Trans. Patt. Anal. Mach. Intell. 23(9), 947–963 (2001)

    Article  Google Scholar 

  29. Vailaya A., Figueiredo M.A.T., Jain A.K., Zhang H.J.: Image classification for content-based indexing. in: IEEE Trans. Image Process. 10(1), 117–130 (2001)

    Article  MATH  Google Scholar 

  30. Szummer, M., Picard, R.: Indoor-outdoor image classification. In: Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Database, Bombay, India, January 1998, pp. 42–51 (1998)

  31. Oliva A., Torralba A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  32. Swain M., Ballard D.: Color indexing. Int. J. Comput. Vis. 7(1), 11–32 (1991)

    Article  Google Scholar 

  33. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–26 June 2005, pp. 524–531 (2005)

  34. Bosch, A., Munoz, X., Oliver, A., Marti, R.: Object and scene classification: what does a supervised approach provide us. In: Proceedings of the 18th International Conference on Pattern Recognition, IEEE Computer Society, ICPR, Hong Kong, China, 20–24 Aug 2006, pp. 773–777 (2006)

  35. Farinella, G., Battiato, S.: Representation models and machine learning techniques for scene classification. In: Wang, P.S.P. (ed.) Chapter in Pattern Recognition, Machine Vision, Principles and Applications, Chap. 13, pp. 199–214. River publisher, Denmark (2010)

  36. Zhu L., Zhang A.: Theory of keyblock-based image retrieval. ACM Trans. Inf. Syst. (TOIS) 20(2), 224–257 (2002)

    Article  Google Scholar 

  37. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision, ICCV, Nice, France, 14–17 Octo 2003, pp. 1470–1477 (2003)

  38. Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: Proceedings of 10th IEEE International Conference on Computer Vision, ICCV, Beijing, China, 17–20 Octo 2005, pp. 604–610 (2005)

  39. Nilsback, M., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of IEEE Conference on Computer Vision, CVPR, New York, USA, 17–22 June 2006, pp. 1447–1454 (2006)

  40. van Gemert J.C., Snoek C.G.M., Veenman C.J., Smeulders A.W.M., Geusebroek J.M.: Comparing compact codebooks for visual categorization. Comput. Vis. Image Underst. 114(4), 450–462 (2010)

    Article  Google Scholar 

  41. Battiato, S., Farinella, G.M., Guarnera, G.C., Meccio, T., Puglisi, G., Ravi, D., Rizzo, R.: Bags of phrases with codebooks alignment for near duplicate image detection. In: Proceedings of the 2nd ACM Workshop on Multimedia in Forensics, Security and Intelligence, Firenze, Italy, 25–29 Octo 2010, pp. 65–70 (2010)

  42. Jiang Y.G., Yang J., Ngo C.W., Hauptmann A.G.: Representations of keypoint-based semantic concept detection: a comprehensive study. in: IEEE Trans. Multimed. 12(1), 42–53 (2010)

    Article  Google Scholar 

  43. Mikolajczyk K., Schmid C.: A performance evaluation of local descriptors. in: IEEE Trans Pattern Anal. Machine Intell. 27, 1615–1630 (2005)

    Article  Google Scholar 

  44. http://www.lear.inrialpes.fr/people/mikolajczyk/.

  45. Odone F., Barla A., Verri A.: Building kernels from binary strings for image matching. in: IEEE Trans. Image Process. 14(2), 169–180 (2005)

    Article  MathSciNet  Google Scholar 

  46. Chang, C.-C., Ling, C.-J., LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/, 2001

  47. van de Sande K.E.A., Gevers T., Snoek C.G.M.: Evaluating color descriptors for object and scene recognition. in: IEEE Trans Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)

    Article  Google Scholar 

  48. Guldogan E., Gabbouj M.: Feature selection for content-based image retrieval. J. Signal Image Video Process 2(3), 241–250 (2008)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousef Alqasrawi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alqasrawi, Y., Neagu, D. & Cowling, P.I. Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification. SIViP 7, 759–775 (2013). https://doi.org/10.1007/s11760-011-0266-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-011-0266-0

Keywords

Navigation