Hierarchical Space Tiling for Scene Modeling

Wang, Shuo; Wang, Yizhou; Zhu, Song-Chun

doi:10.1007/978-3-642-37444-9_62

Shuo Wang^20,21,
Yizhou Wang²⁰ &
Song-Chun Zhu²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7725))

Included in the following conference series:

Asian Conference on Computer Vision

3938 Accesses
6 Citations

Abstract

A typical scene category, e.g., street and beach, contains an enormous number (e.g., in the order of 10⁴ to 10⁵) of distinct scene configurations that are composed of objects and regions of varying shapes in different layouts. A well-known representation that can effectively address such complexity is the family of compositional models; however, learning the structures of the hierarchical compositional models remains a challenging task in vision. The objective of this paper is to present an efficient method for learning such models from a set of scene configurations. We start with an over-complete representation called Hierarchical Space Tiling (HST), which quantizes the huge and continuous scene configuration space in an And-Or tree (AOT). This hierarchical AOT can generate a combinatorial number of configurations (in the order of 10³¹) through a small dictionary of elements. Then we estimate the HST/AOT model through a learning-by-parsing strategy, which iteratively updates the HST/AOT parameters while constructing the optimal parse trees for each training configuration. Finally we prune out the branches with zero or low probability to obtain a much smaller HST/AOT. The HST quantization allows us to transfer the challenging structure-learning problem to a tractable parameter-learning problem. We evaluate the representation in three aspects. (i) Coding efficiency. We show the learned representation can approximate valid configurations with less errors using smaller number of primitives than other popular representations. (ii) Semantic power of learning. The learned representation is less ambiguous in parsing configuration and has semantically meaningful inner concepts. It captures both the diversity and the frequency (prior) of the scene configurations. (iii) Scene classification. The model is not only fully generative but also yields discriminative scene classification performance which outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berg, M., Cheong, O., Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications, 3rd edn. Springer (2008)
Google Scholar
Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation (2006)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 145–175 (2001)
Google Scholar
Parizi, S.N., Oberlin, J., Felzenszwalb, P.: Reconfigurable models for scene recognition. In: CVPR (2012)
Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. IJCV, 157–173 (2008)
Google Scholar
Socher, R., Lin, C., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: ICML (2011)
Google Scholar
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory (1967)
Google Scholar
Wang, J., Yang, J., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR (2010)
Google Scholar
Zhu, J., Wu, T.F., Zhu, S.C., Yang, X.K., Zhang, W.J.: Learning reconfigurable scene representation by Tangram Model. In: Workshop on Application of Computer Vision (2012)
Google Scholar
Zhu, S.C., Mumford, D.: A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 259–362 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Nat’l Engineering Lab for Video Technology Key Lab. of Machine Perception, Department of EECS, Peking University, Beijing, China
Shuo Wang & Yizhou Wang
Center for Vision, Cognition, Learning and Arts (VCLA), Department of Statistics, University of California, Los Angeles, U.S.
Shuo Wang & Song-Chun Zhu

Authors

Shuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Song-Chun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, 151-744, Seoul, Korea
Kyoung Mu Lee
Microsoft Research Asia, No. 5, Danling st., Haidian district, 100080, Beijing, P.R. China
Yasuyuki Matsushita
School of Interactive Computing, Georgia Institute of Technology, 801 Atlantic Drive, CCB 315, 30332, Atlanta, GA, USA
James M. Rehg
Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Zhong Quan Cun East Road 95, Haidian District, 100 190, Beijing, P.R. China
Zhanyi Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Wang, Y., Zhu, SC. (2013). Hierarchical Space Tiling for Scene Modeling. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-37444-9_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37443-2
Online ISBN: 978-3-642-37444-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics