Skip to main content
Log in

Scene recognition combining structural and textural features

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Automatic recognition of the contents of a scene is an important issue in the field of computer vision. Although considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most previous approaches for scene recognition are based on the so-called “bag of visual words” model, which uses clustering methods to quantize numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering centers greatly affect the performance. Furthermore, the large size of the codebook leads to high computational costs and large memory consumption. To overcome these weaknesses, we present an unsupervised natural scene recognition approach that is not based on the “bag of visual words” model. This approach constructs multiple images of different resolutions and extracts structural and textural features from these images. The structural features are represented by weighted histograms of the gradient orientation descriptor, which is presented in this paper, and the textural features are represented by filter responses of Gabor filters and a Schmid set. We regard the structural and textural features as two independent feature channels, and combine them to realize automatic categorization of scenes using a support vector machine. We then evaluated our approach using three commonly used datasets with various scene categories. Our experiments demonstrate that the weighted histograms of the gradient orientation descriptor outperform the classical scale invariant feature transform descriptor in natural-scene recognition, and our approach achieves good performance with respect to current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Torralba A. Contextual priming for object detection. Int J Comput Vision, 2003, 53: 169–191

    Article  Google Scholar 

  2. Torralba A, Murphy K P, Freeman W T, et al. Context-based vision system for place and object recognition. In: IEEE International Conference on Computer Vision, Nice, France, 2003. 273–280

    Chapter  Google Scholar 

  3. Vogel J, Schiele B. Semantic modelling of natural scenes for content-based image retrieval. Int J Comput Vision, 2007, 72: 133–157

    Article  Google Scholar 

  4. Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380

    Article  Google Scholar 

  5. Chen J, Wang Y T, Guo J W, et al. Augmented reality registration algorithm based on nature feature recognition. Sci China Inf Sci, 2010, 53: 1555–1565

    Article  Google Scholar 

  6. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006. 2169–2178

    Google Scholar 

  7. Kivinen J, Sudderth B, Jordan I. Learning multiscale representations of natural scenes using dirichlet processes. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–8

    Google Scholar 

  8. Liu J G, Mubarak S. Scene modeling using co-clustering. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–7

    Google Scholar 

  9. Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Trans Patt Anal Mach Intell, 2008, 30: 712–727

    Article  Google Scholar 

  10. Battiato S, Farinella G M, Gallo G, et al. Spatial hierarchy of textons distributions for scene classification. In: Proceeding of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling, Sophia-Anti polis, France, 2009. 333–343

  11. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature, 1996, 381: 520–522

    Article  Google Scholar 

  12. Li F F, VanRullen R, Koch C, et al.Why does natural scene categorization require little attention? Exploring attentional requirements for natural and synthetic stimuli. Visual Cognition, 2005, 12: 893–924

    Article  Google Scholar 

  13. Peelen M V, Li F F, Kastner S. Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature, 2009, 460: 94–97

    Article  Google Scholar 

  14. Grossberg S, McLoughlin N. Cortical dynamics and three-dimensional surface perception: Binocular and half-occluded scenic images. Neural Netw, 1997, 10: 1583–1605

    Article  Google Scholar 

  15. Wallis G, Rolls E T. A model of invariant object recognition in the visual system. Prog Neurobiol, 1997, 51: 167–194

    Article  Google Scholar 

  16. Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature Neurosci, 1999, 2: 1019–1025

    Article  Google Scholar 

  17. Serre T, Wolf L, Bileschi S, et al. Robust object recognition with cortex-like mechanisms. IEEE Trans Patt Anal Mach Intell, 2007, 29: 411–426

    Article  Google Scholar 

  18. Huang Y Z, Huang K Q, Tao D C, et al. Enhanced biologically inspired model. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008. 1–8

  19. Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380

    Article  Google Scholar 

  20. Szummer M, Picard R. Indoor-outdoor image classification. In: IEEE Workshop on Content-based Access of Image and Video Databases, Bombay, India, 1998. 42–51

  21. Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision, 2001, 42: 145–175

    Article  MATH  Google Scholar 

  22. Mikolajczyk K, Schmid C. Scale and affine invariant interest point detectors. Int J Comput Vision, 2004, 1: 63–86

    Article  Google Scholar 

  23. Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 2: 91–110

    Article  Google Scholar 

  24. Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1615–1630

    Article  Google Scholar 

  25. Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 524–531

    Google Scholar 

  26. Ulrich I, Nourbakhsh I R. Appearance-based place recognition for topological localization. In: Proc. IEEE Int’l Conf. Robotics and Automation, San Francisco, USA, 2000. 1023–1029

    Google Scholar 

  27. Pronobis A, Caputo B, Jensfelt P, et al. A discriminative approach to robust visual place recognition. In: Proc. IEEE/RSJ Int’l Conf Intelligent Robots and Systems, Beijing, China, 2006. 3829–3836

    Google Scholar 

  28. Lazebnik S, Schmid C, Ponce J. Sparse texture representation using affine-invariant neighborhoods. In: IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, 2003. 319–324

    Google Scholar 

  29. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 886–893

    Google Scholar 

  30. Grossberg S, Huang T R. ARTSCENE: A neural system for natural scene classification. J Vision, 2009, 9: 1–19

    Article  Google Scholar 

  31. Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Trans Patt Anal Mach Intell, 2000, 22: 888–905

    Article  Google Scholar 

  32. Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis. IEEE Trans Patt Anal Mach Intell, 2002, 24: 1–17

    Article  Google Scholar 

  33. Felzenszwalb P, Huttenlocher D. Efficient graph-based image segmentation. Int J Comput Vision, 2004, 59: 167–181

    Article  Google Scholar 

  34. Schmid C. Constructing models for content-based image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 2001. 39–45

    Google Scholar 

  35. Chang C C, Lin C J. LIBSVM: a library for support vector machines, 2001. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm

  36. Zhang J, Marszalek M, Lazebnik S, et al. Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision, 2007, 73: 213–238

    Article  Google Scholar 

  37. Pantofaru C, Schmid C, Hebert M. Object recognition by integrating multiple image segmentations. In: Proceedings of the European Conference on Computer Vision, Morseille, France, 2008. 481–494

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to DeWen Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, L., Hu, D. & Zhou, Z. Scene recognition combining structural and textural features. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-011-4421-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4421-6

Keywords

Navigation