Abstract
Traditional scene categorization methods tend to generalize representation of the scene via a holistic approach to calculate a distribution of visual words observed in the image. They disregard spatial information within a scene and are not able to discern categories that share similar sub-scenes but different in layout; or categories that are ambiguous by nature. To address this issue, we propose to incorporate sub-scene attributes within global descriptions to improve categorization performance, especially in ambiguity cases. This is achieved by encoding sub-scenes with layout prototypes that capture the geometric essence of scenes more accurately and flexibly. The proposed method improves categorization accuracy to 92.26 % in the widely used eight scenes dataset, and outperforms all the other published methods. It is also observed that the proposed method is more accurate at detecting and evaluating ambiguity images.
Similar content being viewed by others
References
Boix, X., Gonfaus, J.M., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzàlez, J.: Harmony potentials. Int. J. Comput. Vis. 96, 83–102 (2012)
Manduchi, R., Castano, A., Talukder, A., Matthies, L.: Obstacle detection and terrain classification for autonomous off-road navigation. Auton. Robot. 18, 81–102 (2005)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Siagian, C., Itti, L.: Gist: a mobile robotics application of context-based vision in outdoor environment. In: IEEE Conference on Computer Vision and Pattern Recognition-Workshops, pp. 88–88 (2005)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Computer Vision-ECCV 2006, pp. 288–301. Springer, Berlin (2006)
Berretti, S., Del Bimbo, A., Vicario, E.: Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 23, 1089–1105 (2001)
Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual and link information. In: ACM International Conference on Multimedia, pp. 952–959 (2004)
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 524–531 (2005)
Qin, J., Yung, N.H.C.: Scene categorization via contextual visual words. Pattern Recognit. 43, 1874–1888 (2010)
Bosch, A., Zisserman, A., Muoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30, 712–727 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2169–2178 (2006)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: ACM International Conference on Image and Video Retrieval, pp. 401–408 (2007)
Bosch, A., Zisserman, A., Muoz, X.: Image classification using random forests and ferns. In: IEEE International Conference on Computer Vision, pp. 1–8 (2007)
Qin, J., Yung, N.H.: Feature fusion within local region using localized maximum-margin learning for scene categorization. Pattern Recognit. 45, 1671–1683 (2012)
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: IEEE International Conference on Computer Vision, pp. 1307–1314 (2011)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801 (2009)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3360–3367 (2010)
Zhou, X., Yu, K., Zhang, T., Huang, T.S.: Image classification using super-vector coding of local image descriptors. In: Computer Vision-ECCV 2010, pp. 141–154. Springer, Berlin (2010)
Russakovsky, O., Lin, Y., Yu, K., Fei-Fei, L.: Object-centric spatial pooling for image classification. In: Computer Vision-ECCV 2012, pp. 1–15. Springer, Berlin (2012)
Bosch, A., Zisserman, A., Muoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30, 712–727 (2008)
Torralbo, A., Walther, D.B., Chai, B., Caddigan, E., Fei-Fei, L., Beck, D.M.: Good exemplars of natural scene categories elicit clearer patterns than bad exemplars but not greater BOLD activity. PloS one 8, e58594 (2013)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 72, 133–157 (2007)
Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2013)
Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2036–2043 (2009)
Li, L.-J., Fei-Fei, L.: What, where and who? Classifying events by scene and object recognition. In: International Conference on Computer Vision, pp. 1–8 (2007)
Kwitt, R., Vasconcelos, N., Rasiwasia, N.: Scene recognition on the semantic manifold. In: Computer Vision-ECCV 2012, pp. 359–372. Springer, Berlin (2012)
Dunlop, H.: Scene classification of images and video via semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 72–79 (2010)
Zhu, J., Wu, T., Zhu, S.-C., Yang, X., Zhang, W.: Learning reconfigurable scene representation by tangram model. In: IEEE Workshop on Applications of Computer Vision, pp. 449–456 (2012)
Wang, S., Wang, Y., Zhu, S.-C.: Hierarchical space tiling for scene modeling. In: Computer Vision-ACCV 2012, pp. 796–810. Springer, Berlin (2013)
Wang, S., Joo, J., Wang, Y., Zhu, S. C.: Weakly supervised learning for attribute localization in outdoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3111–3118 (2013)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. In: Proceedings of the NIPS, pp. 109–117 (2011)
Acknowledgments
This work was supported in part by a Grant from the Research Grant Council of the Hong Kong Special Administrative Region, China, under Project HKU718912E, and in part by the Postgraduate Studentship of the University of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Ss., Yung, N.H.C. Improve scene categorization via sub-scene recognition. Machine Vision and Applications 25, 1561–1572 (2014). https://doi.org/10.1007/s00138-014-0622-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-014-0622-5