Recovering Surface Layout from an Image

Hoiem, Derek; Efros, Alexei A.; Hebert, Martial

doi:10.1007/s11263-006-0031-y

Recovering Surface Layout from an Image

Published: 16 February 2007

Volume 75, pages 151–172, (2007)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Derek Hoiem¹,
Alexei A. Efros¹ &
Martial Hebert¹

1926 Accesses
467 Citations
6 Altmetric
Explore all metrics

Abstract

Humans have an amazing ability to instantly grasp the overall 3D structure of a scene—ground orientation, relative positions of major landmarks, etc.—even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this “surface layout” of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis.

In this paper, we take the first step towards constructing the surface layout, a labeling of the image intogeometric classes. Our main insight is to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region. Our multiple segmentation framework provides robust spatial support, allowing a wide variety of cues (e.g., color, texture, and perspective) to contribute to the confidence in each geometric label. In experiments on a large set of outdoor images, we evaluate the impact of the individual cues and design choices in our algorithm. We further demonstrate the applicability of our method to indoor images, describe potential applications, and discuss extensions to a more complete notion of surface layout.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahuja, N. 1996. A transform for multiscale image segmentation by integrated edge and region detection. PAMI, 18(12).
Arbelaez, P. 2006. Boundary extraction in natural images using ultrametric contour maps. In Proc. CVPRW.
Barbu, A. and Zhu, S.-C. 2005. Generalizing swendsen-wang to sampling arbitrary posterior probabilities. PAMI, 27(8):1239–1253.
Google Scholar
Barrow, H. and Tenenbaum, J. 1978. Recovering intrinsic scene characteristics from images. In Computer Vision Systems.
Biederman, I. 1981. On the semantics of a glance at a scene. In Kubovy, M. and Pomerantz, J.R., (Eds), Perceptual Organization, chapter 8. Lawrence Erlbaum.
Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. PAMI, 23(11):1222–1239.
Google Scholar
Brooks, R., Greiner, R., and Binford, T. 1979. Model-based three-dimensional interpretation of two-dimensional images. In Proc. Int. Joint Conf. on Art. Intell.
Collins, M., Schapire, R., and Singer, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1–3).
Criminisi, A., Reid, I., and Zisserman, A. 2000. Single view metrology. IJCV, 40(2).
Delage, E., Lee, H., and Ng, A.Y. 2006. A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In Proc. CVPR.
en Guo, C., Zhu, S.-C., and Wu, Y.N. 2003. Towards a mathematical theory of primal sketch and sketchability. In Proc. ICCV.
Everingham, M.R., Thomas, B.T., and Troscianko, T. 1999. Head-mounted mobility aid for low vision using scene classification techniques. Int. J. of Virt. Reality, 3(4).
Felzenszwalb, P. and Huttenlocher, D. 2004. Efficient graph-based image segmentation. IJCV, 59(2).
Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2).
Gibson, J. 1950. The Perception of the Visual World. Houghton Mifflin.
Guzman-Arenas, A. 1968. Computer recognition of three-dimensional objects in a visual scene. In MIT AI-TR.
Han, F. and Zhu, S.-C. 2003. Bayesian reconstruction of 3d shapes and scenes from a single image. In Int. Work. on Higher-Level Know. in 3D Modeling and Motion Anal.
Han, F. and Zhu, S.-C. 2005. Bottom-up/top-down image parsing by attribute graph grammar. In Proc. ICCV.
Hanson, A. and Riseman, E. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems.
Hartley, R.I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, 2nd edition. Cambridge University Press.
Hoiem, D., Efros, A.A., and Hebert, M. 2005. Automatic photo pop-up. In ACM SIGGRAPH.
Hoiem, D., Efros, A.A., and Hebert, M. 2005. Geometric context from a single image. In Proc. ICCV.
Hoiem D., Efros, A.A., and Hebert, M. 2006. Putting objects in perspective. In Proc. CVPR.
Koenderink, J.J. 1998. Pictorial relief. Phil. Trans. of the Roy. Soc., pp. 1071–1086.
Koenderink, J.J., Doorn, A.J.V., and Kappers, A.M.L. 1996. Pictorial surface attitude and local depth comparisons. Perception and Psychophysics, 58(2):163–173.
Google Scholar
Konishi, S. and Yuille, A. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proc. CVPR.
Kosecka, J. and Zhang, W. 2002. Video compass. In Proc. ECCV. Springer-Verlag.
Lafferty, J.D., McCallum, A., and Pereira, F.C.N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML. Morgan Kaufmann Publishers Inc.
Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43(1):29–44.
Article MATH Google Scholar
Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazy snapping. ACM Trans. on Graphics, 23(3):303–308.
Article Google Scholar
Liebowitz, D., Criminisi, A., and Zisserman, A. 1999. Creating architectural models from images. In Proc. EuroGraphics, vol. 18.
Marr, D. 1982. Vision. Freeman, San Francisco.
Google Scholar
Murphy, K., Torralba, A., and Freeman, W.T. 2003. Graphical model for recognizing scenes and objects. In Proc. NIPS.
Nabbe, B., Hoiem D., Efros, A.A., and Hebert M. 2006. Opportunistic use of vision to push back the path-planning horizon. In Proc. IROS.
Ohta, Y. 1985. Knowledge-Based Interpretation of Outdoor Natural Color Scenes. Pitman.
Ohta, Y., Kanade, T., and Sakai, T. 1978. An analysis system for scenes containing objects with substructures. In IJCPR, pp. 752–754.
Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145–175.
Article MATH Google Scholar
Pollefeys, M., Koch, R., and Gool, L.J.V. 1998. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proc. ICCV.
Rabinovich, A., Belongie, S., Lange, T., and Buhmann, J.M. 2006. Model order selection and cue combination for image segmentation. In Proc. CVPR.
Ren, X. and Malik, J. 2003. Learning a classification model for segmentation. In Proc. ICCV.
Roberts, L. 1965. Machine perception of 3-d solids, pp. 159–197.
Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR.
Saxena, A., Chung, S., and Ng, A.Y. 2005. Learning depth from single monocular images. In Proc. NIPS.
Schapire, R.E. and Singer, Y. 1999. Improved boosting using confidence-rated predictions. Machine Learning, 37(3):297–336.
Article MATH Google Scholar
Sharon, E., Brandt, A., and Basri, R. 2000. Fast multiscale image segmentation. In Proc. CVPR.
Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. PAMI, 22(8).
Singhal, A., Luo, J., and Zhu, W. 2003. Probabilistic spatial context models for scene content understanding. In Proc. CVPR.
Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2005. Learning hierarchical models of scenes, objects, and parts. In Proc. ICCV.
Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2006. Depth from familiar objects: A hierarchical model for 3d scenes. In Proc. CVPR.
Tao, H., Sawhney, H.S., and Kumar, R. 2001. A global matching framework for stereo computation. In Proc. ICCV, pp. 532–539.
Tenenbaum, J. and Barrow, H. 1977. Experiments in interpretation guided segmentation. 8(3):241–274.
Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. PAMI, 24(9).
Tu, Z., Chen, X., Yuille, A.L., and Zhu, S.-C. 2005. Image parsing: Unifying segmentation, detection, and recognition. IJCV, 63(2):113–140.
Article Google Scholar
Tu, Z. and Zhu, S.-C. 2002. Image segmentation by data-driven markov chain monte carlo. PAMI, pp. 657–673.
Warren, R.M. and Warren, R.P. 1968. Helmholtz on Perception: Its Physiology and Development. John Wiley & Sons.
Yakimovsky, Y. and Feldman, J.A. 1973. A semantics-based decision theory region analyzer. In Proc. Int. Joint Conf. on Art. Intell., pp. 580–588.

Download references

Author information

Authors and Affiliations

Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Derek Hoiem, Alexei A. Efros & Martial Hebert

Authors

Derek Hoiem
View author publications
You can also search for this author in PubMed Google Scholar
Alexei A. Efros
View author publications
You can also search for this author in PubMed Google Scholar
Martial Hebert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoiem, D., Efros, A.A. & Hebert, M. Recovering Surface Layout from an Image. Int J Comput Vis 75, 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y

Download citation

Received: 28 April 2006
Accepted: 19 December 2006
Published: 16 February 2007
Issue Date: October 2007
DOI: https://doi.org/10.1007/s11263-006-0031-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recovering Surface Layout from an Image

Abstract

Access this article

Similar content being viewed by others

Indoor scene modeling from a single image using normal inference and edge features

3DNN: 3D Nearest Neighbor

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recovering Surface Layout from an Image

Abstract

Access this article

Similar content being viewed by others

Indoor scene modeling from a single image using normal inference and edge features

3DNN: 3D Nearest Neighbor

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation