Scene recognition combining structural and textural features

Zhou, Li; Hu, DeWen; Zhou, ZongTan

doi:10.1007/s11432-011-4421-6

Scene recognition combining structural and textural features

Research Papers
Published: 17 October 2011

Volume 56, pages 1–14, (2013)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Li Zhou¹,
DeWen Hu¹ &
ZongTan Zhou¹

223 Accesses
5 Citations
Explore all metrics

Abstract

Automatic recognition of the contents of a scene is an important issue in the field of computer vision. Although considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most previous approaches for scene recognition are based on the so-called “bag of visual words” model, which uses clustering methods to quantize numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering centers greatly affect the performance. Furthermore, the large size of the codebook leads to high computational costs and large memory consumption. To overcome these weaknesses, we present an unsupervised natural scene recognition approach that is not based on the “bag of visual words” model. This approach constructs multiple images of different resolutions and extracts structural and textural features from these images. The structural features are represented by weighted histograms of the gradient orientation descriptor, which is presented in this paper, and the textural features are represented by filter responses of Gabor filters and a Schmid set. We regard the structural and textural features as two independent feature channels, and combine them to realize automatic categorization of scenes using a support vector machine. We then evaluated our approach using three commonly used datasets with various scene categories. Our experiments demonstrate that the weighted histograms of the gradient orientation descriptor outperform the classical scale invariant feature transform descriptor in natural-scene recognition, and our approach achieves good performance with respect to current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Torralba A. Contextual priming for object detection. Int J Comput Vision, 2003, 53: 169–191
Article Google Scholar
Torralba A, Murphy K P, Freeman W T, et al. Context-based vision system for place and object recognition. In: IEEE International Conference on Computer Vision, Nice, France, 2003. 273–280
Chapter Google Scholar
Vogel J, Schiele B. Semantic modelling of natural scenes for content-based image retrieval. Int J Comput Vision, 2007, 72: 133–157
Article Google Scholar
Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380
Article Google Scholar
Chen J, Wang Y T, Guo J W, et al. Augmented reality registration algorithm based on nature feature recognition. Sci China Inf Sci, 2010, 53: 1555–1565
Article Google Scholar
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006. 2169–2178
Google Scholar
Kivinen J, Sudderth B, Jordan I. Learning multiscale representations of natural scenes using dirichlet processes. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–8
Google Scholar
Liu J G, Mubarak S. Scene modeling using co-clustering. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–7
Google Scholar
Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Trans Patt Anal Mach Intell, 2008, 30: 712–727
Article Google Scholar
Battiato S, Farinella G M, Gallo G, et al. Spatial hierarchy of textons distributions for scene classification. In: Proceeding of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling, Sophia-Anti polis, France, 2009. 333–343
Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature, 1996, 381: 520–522
Article Google Scholar
Li F F, VanRullen R, Koch C, et al.Why does natural scene categorization require little attention? Exploring attentional requirements for natural and synthetic stimuli. Visual Cognition, 2005, 12: 893–924
Article Google Scholar
Peelen M V, Li F F, Kastner S. Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature, 2009, 460: 94–97
Article Google Scholar
Grossberg S, McLoughlin N. Cortical dynamics and three-dimensional surface perception: Binocular and half-occluded scenic images. Neural Netw, 1997, 10: 1583–1605
Article Google Scholar
Wallis G, Rolls E T. A model of invariant object recognition in the visual system. Prog Neurobiol, 1997, 51: 167–194
Article Google Scholar
Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature Neurosci, 1999, 2: 1019–1025
Article Google Scholar
Serre T, Wolf L, Bileschi S, et al. Robust object recognition with cortex-like mechanisms. IEEE Trans Patt Anal Mach Intell, 2007, 29: 411–426
Article Google Scholar
Huang Y Z, Huang K Q, Tao D C, et al. Enhanced biologically inspired model. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008. 1–8
Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380
Article Google Scholar
Szummer M, Picard R. Indoor-outdoor image classification. In: IEEE Workshop on Content-based Access of Image and Video Databases, Bombay, India, 1998. 42–51
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision, 2001, 42: 145–175
Article MATH Google Scholar
Mikolajczyk K, Schmid C. Scale and affine invariant interest point detectors. Int J Comput Vision, 2004, 1: 63–86
Article Google Scholar
Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 2: 91–110
Article Google Scholar
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1615–1630
Article Google Scholar
Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 524–531
Google Scholar
Ulrich I, Nourbakhsh I R. Appearance-based place recognition for topological localization. In: Proc. IEEE Int’l Conf. Robotics and Automation, San Francisco, USA, 2000. 1023–1029
Google Scholar
Pronobis A, Caputo B, Jensfelt P, et al. A discriminative approach to robust visual place recognition. In: Proc. IEEE/RSJ Int’l Conf Intelligent Robots and Systems, Beijing, China, 2006. 3829–3836
Google Scholar
Lazebnik S, Schmid C, Ponce J. Sparse texture representation using affine-invariant neighborhoods. In: IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, 2003. 319–324
Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 886–893
Google Scholar
Grossberg S, Huang T R. ARTSCENE: A neural system for natural scene classification. J Vision, 2009, 9: 1–19
Article Google Scholar
Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Trans Patt Anal Mach Intell, 2000, 22: 888–905
Article Google Scholar
Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis. IEEE Trans Patt Anal Mach Intell, 2002, 24: 1–17
Article Google Scholar
Felzenszwalb P, Huttenlocher D. Efficient graph-based image segmentation. Int J Comput Vision, 2004, 59: 167–181
Article Google Scholar
Schmid C. Constructing models for content-based image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 2001. 39–45
Google Scholar
Chang C C, Lin C J. LIBSVM: a library for support vector machines, 2001. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm
Zhang J, Marszalek M, Lazebnik S, et al. Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision, 2007, 73: 213–238
Article Google Scholar
Pantofaru C, Schmid C, Hebert M. Object recognition by integrating multiple image segmentations. In: Proceedings of the European Conference on Computer Vision, Morseille, France, 2008. 481–494
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automatic Control, College of Mechatronics and Automation, National University of Defense Technology, Changsha, 410073, China
Li Zhou, DeWen Hu & ZongTan Zhou

Authors

Li Zhou
View author publications
You can also search for this author in PubMed Google Scholar
DeWen Hu
View author publications
You can also search for this author in PubMed Google Scholar
ZongTan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to DeWen Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, L., Hu, D. & Zhou, Z. Scene recognition combining structural and textural features. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-011-4421-6

Download citation

Received: 17 May 2011
Accepted: 15 August 2011
Published: 17 October 2011
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11432-011-4421-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene recognition combining structural and textural features

Abstract

Access this article

Similar content being viewed by others

GPCA-SIFT: A New Local Feature Descriptor for Scene Image Classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

Locality constrained encoding of frequency and spatial information for image classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene recognition combining structural and textural features

Abstract

Access this article

Similar content being viewed by others

GPCA-SIFT: A New Local Feature Descriptor for Scene Image Classification

Bag-of-Words Image Representation: Key Ideas and Further Insight

Locality constrained encoding of frequency and spatial information for image classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation