Abstract
Automatic recognition of the contents of a scene is an important issue in the field of computer vision. Although considerable progress has been made, the complexity of scenes remains an important challenge to computer vision research. Most previous approaches for scene recognition are based on the so-called “bag of visual words” model, which uses clustering methods to quantize numerous local region descriptors into a codebook. The size of the codebook and the selection of initial clustering centers greatly affect the performance. Furthermore, the large size of the codebook leads to high computational costs and large memory consumption. To overcome these weaknesses, we present an unsupervised natural scene recognition approach that is not based on the “bag of visual words” model. This approach constructs multiple images of different resolutions and extracts structural and textural features from these images. The structural features are represented by weighted histograms of the gradient orientation descriptor, which is presented in this paper, and the textural features are represented by filter responses of Gabor filters and a Schmid set. We regard the structural and textural features as two independent feature channels, and combine them to realize automatic categorization of scenes using a support vector machine. We then evaluated our approach using three commonly used datasets with various scene categories. Our experiments demonstrate that the weighted histograms of the gradient orientation descriptor outperform the classical scale invariant feature transform descriptor in natural-scene recognition, and our approach achieves good performance with respect to current state-of-the-art methods.
Similar content being viewed by others
References
Torralba A. Contextual priming for object detection. Int J Comput Vision, 2003, 53: 169–191
Torralba A, Murphy K P, Freeman W T, et al. Context-based vision system for place and object recognition. In: IEEE International Conference on Computer Vision, Nice, France, 2003. 273–280
Vogel J, Schiele B. Semantic modelling of natural scenes for content-based image retrieval. Int J Comput Vision, 2007, 72: 133–157
Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380
Chen J, Wang Y T, Guo J W, et al. Augmented reality registration algorithm based on nature feature recognition. Sci China Inf Sci, 2010, 53: 1555–1565
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006. 2169–2178
Kivinen J, Sudderth B, Jordan I. Learning multiscale representations of natural scenes using dirichlet processes. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–8
Liu J G, Mubarak S. Scene modeling using co-clustering. In: IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil, 2007. 1–7
Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Trans Patt Anal Mach Intell, 2008, 30: 712–727
Battiato S, Farinella G M, Gallo G, et al. Spatial hierarchy of textons distributions for scene classification. In: Proceeding of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling, Sophia-Anti polis, France, 2009. 333–343
Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature, 1996, 381: 520–522
Li F F, VanRullen R, Koch C, et al.Why does natural scene categorization require little attention? Exploring attentional requirements for natural and synthetic stimuli. Visual Cognition, 2005, 12: 893–924
Peelen M V, Li F F, Kastner S. Neural mechanisms of rapid natural scene categorization in human visual cortex. Nature, 2009, 460: 94–97
Grossberg S, McLoughlin N. Cortical dynamics and three-dimensional surface perception: Binocular and half-occluded scenic images. Neural Netw, 1997, 10: 1583–1605
Wallis G, Rolls E T. A model of invariant object recognition in the visual system. Prog Neurobiol, 1997, 51: 167–194
Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nature Neurosci, 1999, 2: 1019–1025
Serre T, Wolf L, Bileschi S, et al. Robust object recognition with cortex-like mechanisms. IEEE Trans Patt Anal Mach Intell, 2007, 29: 411–426
Huang Y Z, Huang K Q, Tao D C, et al. Enhanced biologically inspired model. In: IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008. 1–8
Smeulders A W, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years. IEEE Trans Patt Anal Mach Intell, 2000, 22: 1349–1380
Szummer M, Picard R. Indoor-outdoor image classification. In: IEEE Workshop on Content-based Access of Image and Video Databases, Bombay, India, 1998. 42–51
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision, 2001, 42: 145–175
Mikolajczyk K, Schmid C. Scale and affine invariant interest point detectors. Int J Comput Vision, 2004, 1: 63–86
Lowe D. Distinctive image features from scale-invariant keypoints. Int J Comput Vision, 2004, 2: 91–110
Mikolajczyk K, Schmid C. A performance evaluation of local descriptors. IEEE Trans Patt Anal Mach Intell, 2005, 27: 1615–1630
Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 524–531
Ulrich I, Nourbakhsh I R. Appearance-based place recognition for topological localization. In: Proc. IEEE Int’l Conf. Robotics and Automation, San Francisco, USA, 2000. 1023–1029
Pronobis A, Caputo B, Jensfelt P, et al. A discriminative approach to robust visual place recognition. In: Proc. IEEE/RSJ Int’l Conf Intelligent Robots and Systems, Beijing, China, 2006. 3829–3836
Lazebnik S, Schmid C, Ponce J. Sparse texture representation using affine-invariant neighborhoods. In: IEEE Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, USA, 2003. 319–324
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005. 886–893
Grossberg S, Huang T R. ARTSCENE: A neural system for natural scene classification. J Vision, 2009, 9: 1–19
Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Trans Patt Anal Mach Intell, 2000, 22: 888–905
Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis. IEEE Trans Patt Anal Mach Intell, 2002, 24: 1–17
Felzenszwalb P, Huttenlocher D. Efficient graph-based image segmentation. Int J Comput Vision, 2004, 59: 167–181
Schmid C. Constructing models for content-based image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 2001. 39–45
Chang C C, Lin C J. LIBSVM: a library for support vector machines, 2001. Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm
Zhang J, Marszalek M, Lazebnik S, et al. Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision, 2007, 73: 213–238
Pantofaru C, Schmid C, Hebert M. Object recognition by integrating multiple image segmentations. In: Proceedings of the European Conference on Computer Vision, Morseille, France, 2008. 481–494
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, L., Hu, D. & Zhou, Z. Scene recognition combining structural and textural features. Sci. China Inf. Sci. 56, 1–14 (2013). https://doi.org/10.1007/s11432-011-4421-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4421-6