Skip to main content
Log in

Extracting View-Dependent Depth Maps from a Collection of Images

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Stereo correspondence algorithms typically produce a single depth map. In addition to the usual problems of occlusions and textureless regions, such algorithms cannot model the variation in scene or object appearance with respect to the viewing position. In this paper, we propose a new representation that overcomes the appearance variation problem associated with an image sequence. Rather than estimating a single depth map, we associate a depth map with each input image (or a subset of them). Our representation is motivated by applications such as view interpolation and depth-based segmentation for model-building or layer extraction. We describe two approaches to extract such a representation from a sequence of images.

The first approach, which is more classical, computes the local depth map associated with each chosen reference frame independently. The novelty of this approach lies in its combination of shiftable windows, temporal selection, and graph cut optimization. The second approach simultaneously optimizes a set of self-consistent depth maps at multiple key-frames. Since multiple depth maps are estimated simultaneously, visibility can be modeled explicitly and disparity consistency imposed across the different depth maps. Results, which include a difficult specular scene example, show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Arnold, R.D. 1983. Automated stereo perception. Technical Report AIM-351, Artificial Intelligence Laboratory, Stanford University.

  • Baker, S., Szeliski, R., and Anandan, P. 1998. A layered approach to stereo reconstruction. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'98). Santa Barbara, pp. 434-441.

  • Barnard, S.T. 1989. Stochastic stereo matching over scale. International Journal of Computer Vision, 3(1):17-32.

    Google Scholar 

  • Belhumeur, P.N. 1996. A Bayesian-approach to binocular stereopsis. International Journal of Computer Vision, 19(3):237-260.

    Google Scholar 

  • Bergen, J.R., Anandan, P., Hanna, K.J., and Hingorani, R. 1992. Hierarchical model-based motion estimation. In Second European Conference on Computer Vision (ECCV'92). Santa Margherita Liguere, Italy, pp. 237-252, Springer-Verlag.

    Google Scholar 

  • Birchfield, S. and Tomasi, C. 1998.Apixel dissimilarity measure that is insensitive to image sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(4):401-406.

    Google Scholar 

  • Birchfield, S. and Tomasi, C. 1999. Multiway cut for stereo and motion with slanted surfaces. In Seventh International Conference on Computer Vision (ICCV'99). Kerkyra, Greece, pp. 489-495.

    Google Scholar 

  • Black, M.J. and Jepson, A.D. 1996. Estimating optical flow in segmented images using variable-order parametric models with local deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(10):972-986.

    Google Scholar 

  • Black, M.J. and Rangarajan, A. 1996. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal of Computer Vision, 19(1):57-91.

    Google Scholar 

  • Blonde, L. et al. 1996. A virtual studio for live broadcasting: The Mona Lisa project. IEEE Multimedia, 3(2):18-29.

    Google Scholar 

  • Bobick, A.F. and Intille, S.S. 1999. Large occlusion stereo. International Journal of Computer Vision, 33(3):181-200.

    Google Scholar 

  • Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222-1239.

    Google Scholar 

  • Buehler, C., Bosse, M., McMillan, L., Gortler, S.J., and Cohen, M.F. 2001. Unstructured Lumigraph Rendering. In Proceedings of SIGGRAPH 2001, pp. 425-432. ISBN 1-58113-292-1.

  • Chou, P.B. and Brown, C.M. 1990. The theory and practice of Bayesian image labeling. International Journal of Computer Vision, 4(3):185-210.

    Google Scholar 

  • Collins, R.T. 1996. A space-sweep approach to true multi-image matching. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 358-363.

  • de Hann, G. and Beller, E.B. 1998. Deinterlacing—An overview. Proceedings of the IEEE 86(9):1839-1857.

    Google Scholar 

  • Debevec, P.E., Taylor, C.J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. Computer Graphics (SIGGRAPH'96) pp. 11-20.

  • Debevec, P.E., Yu, Y., and Borshukov, G.D. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics Rendering Workshop 1998, pp. 105-116. ISBN 3-211-83213-0. Held in Vienna, Austria.

  • Dhond, U.R. and Aggarwal, J.K. 1989. Structure from stereo—A review. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1489-1510.

    Google Scholar 

  • Geiger, D., Ladendorf, B., and Yuille, A. 1992. Occlusions and binocular stereo. In Second European Conference on Computer Vision (ECCV'92). Santa Margherita Liguere, Italy, pp. 425-433, Springer-Verlag.

    Google Scholar 

  • Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721-741.

    Google Scholar 

  • Gortler, S.J., Grzeszczuk, R., Szeliski, R., and Cohen, M.F. 1996. The Lumigraph. In Computer Graphics Proceedings, Annual Conference Series. Proc. SIGGRAPH'96 (New Orleans): pp. 43-54, ACM SIGGRAPH.

  • Hanna, K.J. 1991. Direct multi-resolution estimation of ego-motion and structure from motion. In IEEEWorkshop on Visual Motion. Princeton, NewJersey, pp. 156-162, IEEE Computer Society Press.

    Google Scholar 

  • Hoff, W. and Ahuja, N. 1986. Surfaces from stereo. In Eighth International Conference on Pattern Recognition (ICPR'86). Paris, France, pp. 516-518, IEEE Computer Society Press.

    Google Scholar 

  • Irani, M., Anandan, P., and Hsu, S. 1995. Mosaic based representations of video sequences and their applications. In Fifth International Conference on Computer Vision (ICCV'95). Cambridge, Massachusetts, pp. 605-611.

  • Ishikawa, H. and Geiger, D. 1998. Occlusions, discontinuities, and epipolar lines in stereo. In Fifth European Conference on Computer Vision (ECCV'98). Freiburg, Germany, pp. 232-248, Springer-Verlag.

    Google Scholar 

  • Ju, S.X., Black, M.J., and Jepson, A.D. 1996. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 307-314.

  • Kanade, T. et al. 1996. A stereo machine for video-rate dense depth mapping and its newapplications. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 196-202.

  • Kanade, T. and Okutomi, M. 1994. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9):920-932.

    Google Scholar 

  • Kang, S.B., Szeliski, R., and Anandan, P. 2000. The geometry-image representation tradeoff for rendering. In International Conference on Image Processing (ICIP-2000), vol. II. Vancouver, pp. 13-16.

    Google Scholar 

  • Kang, S.B., Szeliski, R., and Chai, J. 2001a. Handling occlusions in dense multi-view stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2001). Kauai, Hawaii.

  • Kang, S.B., Szeliski, R., and Chai, J. 2001b. Handling occlusions in dense multi-view stereo. Technical Report MSR-TR-2001-80, Microsoft Research.

  • Kang, S.B., Webb, J., Zitnick, L., and Kanade, T. 1995. A multibaseline stereo system with active illumination and real-time image acquisition. In Fifth International Conference on Computer Vision (ICCV'95). Cambridge, Massachusetts, pp. 88-93.

  • Kolmogorov, V. and Zabih, R. 2001. Computing visual correspondence with occlusions using graph cuts. In Eighth International Conference on Computer Vision (ICCV 2001), vol. II. Vancouver, Canada, pp. 508-515.

    Google Scholar 

  • Kolmogorov, V. and Zabih, R. 2002. Multi-camera scene reconstruction via graph cuts. In Seventh European Conference on Computer Vision (ECCV 2002), vol. III. Copenhagen, pp. 82-96, Springer-Verlag.

    Google Scholar 

  • Kutulakos, K.N. and Seitz, S.M. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199-218.

    Google Scholar 

  • Le Gall, D. 1991. MPEG: A video compression standard for multimedia applications. Communications of the ACM, 34(4):44-58.

    Google Scholar 

  • Lee, M.-C. et al. 1997. A layered video object coding system using sprite and affine motion model. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 130-145.

    Google Scholar 

  • Levine, M.D., O'Handley, D.A., and Yagi, G.M. 1973. Computer determination of depth images. Computer Graphics and Image Processing, 2(4):131-150.

    Google Scholar 

  • Levoy, M. and Hanrahan, P. 1996. Light field rendering. In Computer Graphics Proceedings, Annual Conference Series. Proc. SIGGRAPH'96 (New Orleans): pp. 31-42, ACM SIGGRAPH.

    Google Scholar 

  • Lucas, B.D. and Kanade, T. 1981. An iterative image registration technique with an application in stereo vision. In Seventh International Joint Conference on Artificial Intelligence (IJCAI-81). Vancouver, pp. 674-679.

  • Marr, D.C. and Poggio, T. 1979. A computational theory of human stereo vision. Proceedings of the Royal Society of London, B 204 301-328.

    Google Scholar 

  • Marroquin, J., Mitter, S., and Poggio, T. 1987. Probabilistic solution of ill-posed problems in computational vision. Journal of the American Statistical Association, 82(397):76-89.

    Google Scholar 

  • Matthies, L.H., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. International Journal of Computer Vision, 3:209-236.

    Google Scholar 

  • McMillan, L. and Bishop, G. 1995. Plenoptic modeling: An image-based rendering system. Computer Graphics (SIGGRAPH'95) pp. 39-46.

  • Nakamura, Y., Matsuura, T., Satoh, K., and Ohta, Y. 1996. Occlusion detectable stereo-occlusion patterns in camera matrix. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 371-378.

  • Ohta, Y. and Kanade, T. 1985. Stereo by intra-and inter-scanline search using dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(2):139-154.

    Google Scholar 

  • Okutomi, M. and Kanade, T. 1992.A locally adaptive windowfor signal matching. International Journal of ComputerVision, 7(2):143-162.

    Google Scholar 

  • Okutomi, M. and Kanade, T. 1993. A multiple baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):353-363.

    Google Scholar 

  • Okutomi, M., Katayama, Y., and Oka, S. 2002. A simple stereo algorithm to recover precise object boundaries and smooth surfaces. International Journal of Computer Vision, 47(1-3):261-273.

    Google Scholar 

  • Poggio, T., Torre, V., and Koch, C. 1985. Computational vision and regularization theory. Nature, 317(6035):314-319.

    Google Scholar 

  • Pulli, K. et al. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Proceedings of the 8-th Eurographics Workshop on Rendering. St. Etienne, France.

    Google Scholar 

  • Quam, L.H. 1984. Hierarchical warp stereo. In Image Understanding Workshop. New Orleans, Louisiana, pp. 149-155, Science Applications International Corporation.

    Google Scholar 

  • Roy, S. and Cox, I.J. 1998. A maximum-flow formulation of the N-camera stereo correspondence problem. In Sixth International Conference on Computer Vision (ICCV'98). Bombay, pp. 492-499.

  • Saito, H. and Kanade, T. 1999. Shape reconstruction in projective grid space from large number of images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), vol. 2. Fort Collins, pp. 49-54.

    Google Scholar 

  • Satoh, K. and Ohta, Y. 1996. Occlusion detectable stereo—systematic comparison of detection algorithms. In 13th International Conference on Pattern Recognition (ICPR'96). Los Alamitos, pp. 280-286.

  • Sawhney, H.S. and Ayer, S. 1996. Compact representation of videos through dominant multiple motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):814-830.

    Google Scholar 

  • Sawhney, H.S. and Hanson, A.R. 1991. Identification and 3D description of "shallow' environmental structure over a sequence of images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'91). Maui, Hawaii, pp. 179-185, IEEE Computer Society Press.

    Google Scholar 

  • Scharstein, D. and Szeliski, R. 1998. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174.

    Google Scholar 

  • Scharstein, D. and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1):7-42.

    Google Scholar 

  • Seitz, S.M. and Dyer, C.M. 1999. Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision, 35(2):151-173.

    Google Scholar 

  • Shade, J., Gortler, S., He, L.-W., and Szeliski, R. 1998. Layered depth images. In Computer Graphics (SIGGRAPH'98) Proceedings. Orlando, pp. 231-242, ACM SIGGRAPH.

  • Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'94). Seattle,Washington, pp. 593-600, IEEE Computer Society.

    Google Scholar 

  • Shum, H.-Y. and He, L.-W. 1999. Rendering with concentric mosaics. In SIGGRAPH'99. Los Angeles, pp. 299-306, ACM SIGGRAPH.

  • Shum, H.-Y. and Szeliski, R. 1999. Stereo reconstruction from multiperspective panoramas. In Seventh International Conference on Computer Vision (ICCV'99). Kerkyra, Greece, pp. 14-21.

  • Sun, S., Haynor, D., and Kim, Y. 2000. Motion estimation based on optical flow with adaptive gradients. In International Conference on Image Processing (ICIP-2000), vol. I. Vancouver, pp. 852-855.

    Google Scholar 

  • Swaminathan, R., Kang, S.B., Szeliski, R., Criminisi, A., and Nayar, S.K. 2002. On the motion and appearance of specularities in image sequences. In Seventh European Conference on Computer Vision (ECCV 2002), Springer-Verlag, Copenhagen, pp. 508-523.

    Google Scholar 

  • Szeliski, R. 1999. A multi-view approach to motion and stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), vol. 1. Fort Collins, pp. 157-163.

    Google Scholar 

  • Szeliski, R., Avidan, S., and Anandan, P. 2000. Layer extraction from multiple images containing reflections and transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2000), vol. 1. Hilton Head Island, pp. 246-253.

    Google Scholar 

  • Szeliski, R. and Coughlan, J. 1997. Hierarchical spline-based image registration. International Journal of ComputerVision, 22(3):199-218.

    Google Scholar 

  • Szeliski, R. and Golland, P. 1999. Stereo matching with transparency and matting. International Journal of Computer Vision, 32(1):45-61. Special Issue for Marr Prize papers.

    Google Scholar 

  • Szeliski, R. and Kang, S.B. 1995. Direct methods for visual scene reconstruction. In IEEE Workshop on Representations of Visual Scenes. Cambridge, Massachusetts, pp. 26-33.

  • Szeliski, R. and Zabih, R. 1999. An experimental comparison of stereo algorithms. In International Workshop on Vision Algorithms. Kerkyra, Greece, pp. 1-19, Springer.

    Google Scholar 

  • Tao, H., Sawhney, H., and Kumar,R. 2001.A global matching framework for stereo computation. In Eighth International Conference on Computer Vision (ICCV 2001), vol. I. Vancouver, Canada, pp. 532-539.

    Google Scholar 

  • Terzopoulos, D. 1986. Regularization of inverse visual problems involving discontinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(4):413-424.

    Google Scholar 

  • Tian, Q. and Huhns, M.N. 1986. Algorithms for subpixel registration. Computer Vision, Graphics, and Image Processing, 35:220-233.

    Google Scholar 

  • Torr, P.H.S., Szeliski, R., and Anandan, P. 2001. An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):297-303.

    Google Scholar 

  • Tsin, Y., Kang, S.B., and Szeliski, R. 2003. Stereo matching with reflections and translucency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2003), Madison, WI, pp. 702-709.

  • Veksler, O. 1999. Efficient graph-based energy minimization methods in computer vision. Ph.D. thesis, Cornell University.

  • Wang, J.Y.A. and Adelson, E.H. 1994. Representing moving images with layers. IEEE Transactions on Image Processing, 3(5):625-638.

    Google Scholar 

  • Weiss, Y. 1997. Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97). San Juan, Puerto Rico, pp. 520-526.

  • Weiss, Y. and Adelson, E.H. 1996. A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 321-326.

  • Yang, Y., Yuille, A., and Lu, J. 1993. Local, global, and multilevel stereo matching. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'93). New York, New York, pp. 274-279, IEEE Computer Society.

    Google Scholar 

  • Zitnick, C.L. and Kanade, T. 2000.A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):675-684.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, S.B., Szeliski, R. Extracting View-Dependent Depth Maps from a Collection of Images. International Journal of Computer Vision 58, 139–163 (2004). https://doi.org/10.1023/B:VISI.0000015917.35451.df

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VISI.0000015917.35451.df

Navigation