Skip to main content

Advertisement

Log in

Recovering Surface Layout from an Image

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Humans have an amazing ability to instantly grasp the overall 3D structure of a scene—ground orientation, relative positions of major landmarks, etc.—even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this “surface layout” of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis.

In this paper, we take the first step towards constructing the surface layout, a labeling of the image intogeometric classes. Our main insight is to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region. Our multiple segmentation framework provides robust spatial support, allowing a wide variety of cues (e.g., color, texture, and perspective) to contribute to the confidence in each geometric label. In experiments on a large set of outdoor images, we evaluate the impact of the individual cues and design choices in our algorithm. We further demonstrate the applicability of our method to indoor images, describe potential applications, and discuss extensions to a more complete notion of surface layout.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahuja, N. 1996. A transform for multiscale image segmentation by integrated edge and region detection. PAMI, 18(12).

  • Arbelaez, P. 2006. Boundary extraction in natural images using ultrametric contour maps. In Proc. CVPRW.

  • Barbu, A. and Zhu, S.-C. 2005. Generalizing swendsen-wang to sampling arbitrary posterior probabilities. PAMI, 27(8):1239–1253.

    Google Scholar 

  • Barrow, H. and Tenenbaum, J. 1978. Recovering intrinsic scene characteristics from images. In Computer Vision Systems.

  • Biederman, I. 1981. On the semantics of a glance at a scene. In Kubovy, M. and Pomerantz, J.R., (Eds), Perceptual Organization, chapter 8. Lawrence Erlbaum.

  • Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. PAMI, 23(11):1222–1239.

    Google Scholar 

  • Brooks, R., Greiner, R., and Binford, T. 1979. Model-based three-dimensional interpretation of two-dimensional images. In Proc. Int. Joint Conf. on Art. Intell.

  • Collins, M., Schapire, R., and Singer, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1–3).

  • Criminisi, A., Reid, I., and Zisserman, A. 2000. Single view metrology. IJCV, 40(2).

  • Delage, E., Lee, H., and Ng, A.Y. 2006. A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In Proc. CVPR.

  • en Guo, C., Zhu, S.-C., and Wu, Y.N. 2003. Towards a mathematical theory of primal sketch and sketchability. In Proc. ICCV.

  • Everingham, M.R., Thomas, B.T., and Troscianko, T. 1999. Head-mounted mobility aid for low vision using scene classification techniques. Int. J. of Virt. Reality, 3(4).

  • Felzenszwalb, P. and Huttenlocher, D. 2004. Efficient graph-based image segmentation. IJCV, 59(2).

  • Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2).

  • Gibson, J. 1950. The Perception of the Visual World. Houghton Mifflin.

  • Guzman-Arenas, A. 1968. Computer recognition of three-dimensional objects in a visual scene. In MIT AI-TR.

  • Han, F. and Zhu, S.-C. 2003. Bayesian reconstruction of 3d shapes and scenes from a single image. In Int. Work. on Higher-Level Know. in 3D Modeling and Motion Anal.

  • Han, F. and Zhu, S.-C. 2005. Bottom-up/top-down image parsing by attribute graph grammar. In Proc. ICCV.

  • Hanson, A. and Riseman, E. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems.

  • Hartley, R.I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, 2nd edition. Cambridge University Press.

  • Hoiem, D., Efros, A.A., and Hebert, M. 2005. Automatic photo pop-up. In ACM SIGGRAPH.

  • Hoiem, D., Efros, A.A., and Hebert, M. 2005. Geometric context from a single image. In Proc. ICCV.

  • Hoiem D., Efros, A.A., and Hebert, M. 2006. Putting objects in perspective. In Proc. CVPR.

  • Koenderink, J.J. 1998. Pictorial relief. Phil. Trans. of the Roy. Soc., pp. 1071–1086.

  • Koenderink, J.J., Doorn, A.J.V., and Kappers, A.M.L. 1996. Pictorial surface attitude and local depth comparisons. Perception and Psychophysics, 58(2):163–173.

    Google Scholar 

  • Konishi, S. and Yuille, A. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proc. CVPR.

  • Kosecka, J. and Zhang, W. 2002. Video compass. In Proc. ECCV. Springer-Verlag.

  • Lafferty, J.D., McCallum, A., and Pereira, F.C.N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML. Morgan Kaufmann Publishers Inc.

  • Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43(1):29–44.

    Article  MATH  Google Scholar 

  • Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazy snapping. ACM Trans. on Graphics, 23(3):303–308.

    Article  Google Scholar 

  • Liebowitz, D., Criminisi, A., and Zisserman, A. 1999. Creating architectural models from images. In Proc. EuroGraphics, vol. 18.

  • Marr, D. 1982. Vision. Freeman, San Francisco.

    Google Scholar 

  • Murphy, K., Torralba, A., and Freeman, W.T. 2003. Graphical model for recognizing scenes and objects. In Proc. NIPS.

  • Nabbe, B., Hoiem D., Efros, A.A., and Hebert M. 2006. Opportunistic use of vision to push back the path-planning horizon. In Proc. IROS.

  • Ohta, Y. 1985. Knowledge-Based Interpretation of Outdoor Natural Color Scenes. Pitman.

  • Ohta, Y., Kanade, T., and Sakai, T. 1978. An analysis system for scenes containing objects with substructures. In IJCPR, pp. 752–754.

  • Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145–175.

    Article  MATH  Google Scholar 

  • Pollefeys, M., Koch, R., and Gool, L.J.V. 1998. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proc. ICCV.

  • Rabinovich, A., Belongie, S., Lange, T., and Buhmann, J.M. 2006. Model order selection and cue combination for image segmentation. In Proc. CVPR.

  • Ren, X. and Malik, J. 2003. Learning a classification model for segmentation. In Proc. ICCV.

  • Roberts, L. 1965. Machine perception of 3-d solids, pp. 159–197.

  • Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR.

  • Saxena, A., Chung, S., and Ng, A.Y. 2005. Learning depth from single monocular images. In Proc. NIPS.

  • Schapire, R.E. and Singer, Y. 1999. Improved boosting using confidence-rated predictions. Machine Learning, 37(3):297–336.

    Article  MATH  Google Scholar 

  • Sharon, E., Brandt, A., and Basri, R. 2000. Fast multiscale image segmentation. In Proc. CVPR.

  • Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. PAMI, 22(8).

  • Singhal, A., Luo, J., and Zhu, W. 2003. Probabilistic spatial context models for scene content understanding. In Proc. CVPR.

  • Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2005. Learning hierarchical models of scenes, objects, and parts. In Proc. ICCV.

  • Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2006. Depth from familiar objects: A hierarchical model for 3d scenes. In Proc. CVPR.

  • Tao, H., Sawhney, H.S., and Kumar, R. 2001. A global matching framework for stereo computation. In Proc. ICCV, pp. 532–539.

  • Tenenbaum, J. and Barrow, H. 1977. Experiments in interpretation guided segmentation. 8(3):241–274.

  • Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. PAMI, 24(9).

  • Tu, Z., Chen, X., Yuille, A.L., and Zhu, S.-C. 2005. Image parsing: Unifying segmentation, detection, and recognition. IJCV, 63(2):113–140.

    Article  Google Scholar 

  • Tu, Z. and Zhu, S.-C. 2002. Image segmentation by data-driven markov chain monte carlo. PAMI, pp. 657–673.

  • Warren, R.M. and Warren, R.P. 1968. Helmholtz on Perception: Its Physiology and Development. John Wiley & Sons.

  • Yakimovsky, Y. and Feldman, J.A. 1973. A semantics-based decision theory region analyzer. In Proc. Int. Joint Conf. on Art. Intell., pp. 580–588.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoiem, D., Efros, A.A. & Hebert, M. Recovering Surface Layout from an Image. Int J Comput Vis 75, 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-006-0031-y

Keywords

Navigation