Extracting View-Dependent Depth Maps from a Collection of Images

Kang, Sing Bing; Szeliski, Richard

doi:10.1023/B:VISI.0000015917.35451.df

Extracting View-Dependent Depth Maps from a Collection of Images

Published: July 2004

Volume 58, pages 139–163, (2004)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Sing Bing Kang¹ &
Richard Szeliski¹

470 Accesses
55 Citations
9 Altmetric
Explore all metrics

Abstract

Stereo correspondence algorithms typically produce a single depth map. In addition to the usual problems of occlusions and textureless regions, such algorithms cannot model the variation in scene or object appearance with respect to the viewing position. In this paper, we propose a new representation that overcomes the appearance variation problem associated with an image sequence. Rather than estimating a single depth map, we associate a depth map with each input image (or a subset of them). Our representation is motivated by applications such as view interpolation and depth-based segmentation for model-building or layer extraction. We describe two approaches to extract such a representation from a sequence of images.

The first approach, which is more classical, computes the local depth map associated with each chosen reference frame independently. The novelty of this approach lies in its combination of shiftable windows, temporal selection, and graph cut optimization. The second approach simultaneously optimizes a set of self-consistent depth maps at multiple key-frames. Since multiple depth maps are estimated simultaneously, visibility can be modeled explicitly and disparity consistency imposed across the different depth maps. Results, which include a difficult specular scene example, show the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User Directed Multi-view-stereo

Match Selection and Refinement for Highly Accurate Two-View Structure from Motion

Stereo Matching—State-of-the-Art and Research Challenges

References

Arnold, R.D. 1983. Automated stereo perception. Technical Report AIM-351, Artificial Intelligence Laboratory, Stanford University.
Baker, S., Szeliski, R., and Anandan, P. 1998. A layered approach to stereo reconstruction. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'98). Santa Barbara, pp. 434-441.
Barnard, S.T. 1989. Stochastic stereo matching over scale. International Journal of Computer Vision, 3(1):17-32.
Google Scholar
Belhumeur, P.N. 1996. A Bayesian-approach to binocular stereopsis. International Journal of Computer Vision, 19(3):237-260.
Google Scholar
Bergen, J.R., Anandan, P., Hanna, K.J., and Hingorani, R. 1992. Hierarchical model-based motion estimation. In Second European Conference on Computer Vision (ECCV'92). Santa Margherita Liguere, Italy, pp. 237-252, Springer-Verlag.
Google Scholar
Birchfield, S. and Tomasi, C. 1998.Apixel dissimilarity measure that is insensitive to image sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(4):401-406.
Google Scholar
Birchfield, S. and Tomasi, C. 1999. Multiway cut for stereo and motion with slanted surfaces. In Seventh International Conference on Computer Vision (ICCV'99). Kerkyra, Greece, pp. 489-495.
Google Scholar
Black, M.J. and Jepson, A.D. 1996. Estimating optical flow in segmented images using variable-order parametric models with local deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(10):972-986.
Google Scholar
Black, M.J. and Rangarajan, A. 1996. On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. International Journal of Computer Vision, 19(1):57-91.
Google Scholar
Blonde, L. et al. 1996. A virtual studio for live broadcasting: The Mona Lisa project. IEEE Multimedia, 3(2):18-29.
Google Scholar
Bobick, A.F. and Intille, S.S. 1999. Large occlusion stereo. International Journal of Computer Vision, 33(3):181-200.
Google Scholar
Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222-1239.
Google Scholar
Buehler, C., Bosse, M., McMillan, L., Gortler, S.J., and Cohen, M.F. 2001. Unstructured Lumigraph Rendering. In Proceedings of SIGGRAPH 2001, pp. 425-432. ISBN 1-58113-292-1.
Chou, P.B. and Brown, C.M. 1990. The theory and practice of Bayesian image labeling. International Journal of Computer Vision, 4(3):185-210.
Google Scholar
Collins, R.T. 1996. A space-sweep approach to true multi-image matching. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 358-363.
de Hann, G. and Beller, E.B. 1998. Deinterlacing—An overview. Proceedings of the IEEE 86(9):1839-1857.
Google Scholar
Debevec, P.E., Taylor, C.J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. Computer Graphics (SIGGRAPH'96) pp. 11-20.
Debevec, P.E., Yu, Y., and Borshukov, G.D. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics Rendering Workshop 1998, pp. 105-116. ISBN 3-211-83213-0. Held in Vienna, Austria.
Dhond, U.R. and Aggarwal, J.K. 1989. Structure from stereo—A review. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1489-1510.
Google Scholar
Geiger, D., Ladendorf, B., and Yuille, A. 1992. Occlusions and binocular stereo. In Second European Conference on Computer Vision (ECCV'92). Santa Margherita Liguere, Italy, pp. 425-433, Springer-Verlag.
Google Scholar
Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721-741.
Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., and Cohen, M.F. 1996. The Lumigraph. In Computer Graphics Proceedings, Annual Conference Series. Proc. SIGGRAPH'96 (New Orleans): pp. 43-54, ACM SIGGRAPH.
Hanna, K.J. 1991. Direct multi-resolution estimation of ego-motion and structure from motion. In IEEEWorkshop on Visual Motion. Princeton, NewJersey, pp. 156-162, IEEE Computer Society Press.
Google Scholar
Hoff, W. and Ahuja, N. 1986. Surfaces from stereo. In Eighth International Conference on Pattern Recognition (ICPR'86). Paris, France, pp. 516-518, IEEE Computer Society Press.
Google Scholar
Irani, M., Anandan, P., and Hsu, S. 1995. Mosaic based representations of video sequences and their applications. In Fifth International Conference on Computer Vision (ICCV'95). Cambridge, Massachusetts, pp. 605-611.
Ishikawa, H. and Geiger, D. 1998. Occlusions, discontinuities, and epipolar lines in stereo. In Fifth European Conference on Computer Vision (ECCV'98). Freiburg, Germany, pp. 232-248, Springer-Verlag.
Google Scholar
Ju, S.X., Black, M.J., and Jepson, A.D. 1996. Skin and bones: Multi-layer, locally affine, optical flow and regularization with transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 307-314.
Kanade, T. et al. 1996. A stereo machine for video-rate dense depth mapping and its newapplications. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 196-202.
Kanade, T. and Okutomi, M. 1994. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9):920-932.
Google Scholar
Kang, S.B., Szeliski, R., and Anandan, P. 2000. The geometry-image representation tradeoff for rendering. In International Conference on Image Processing (ICIP-2000), vol. II. Vancouver, pp. 13-16.
Google Scholar
Kang, S.B., Szeliski, R., and Chai, J. 2001a. Handling occlusions in dense multi-view stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2001). Kauai, Hawaii.
Kang, S.B., Szeliski, R., and Chai, J. 2001b. Handling occlusions in dense multi-view stereo. Technical Report MSR-TR-2001-80, Microsoft Research.
Kang, S.B., Webb, J., Zitnick, L., and Kanade, T. 1995. A multibaseline stereo system with active illumination and real-time image acquisition. In Fifth International Conference on Computer Vision (ICCV'95). Cambridge, Massachusetts, pp. 88-93.
Kolmogorov, V. and Zabih, R. 2001. Computing visual correspondence with occlusions using graph cuts. In Eighth International Conference on Computer Vision (ICCV 2001), vol. II. Vancouver, Canada, pp. 508-515.
Google Scholar
Kolmogorov, V. and Zabih, R. 2002. Multi-camera scene reconstruction via graph cuts. In Seventh European Conference on Computer Vision (ECCV 2002), vol. III. Copenhagen, pp. 82-96, Springer-Verlag.
Google Scholar
Kutulakos, K.N. and Seitz, S.M. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199-218.
Google Scholar
Le Gall, D. 1991. MPEG: A video compression standard for multimedia applications. Communications of the ACM, 34(4):44-58.
Google Scholar
Lee, M.-C. et al. 1997. A layered video object coding system using sprite and affine motion model. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 130-145.
Google Scholar
Levine, M.D., O'Handley, D.A., and Yagi, G.M. 1973. Computer determination of depth images. Computer Graphics and Image Processing, 2(4):131-150.
Google Scholar
Levoy, M. and Hanrahan, P. 1996. Light field rendering. In Computer Graphics Proceedings, Annual Conference Series. Proc. SIGGRAPH'96 (New Orleans): pp. 31-42, ACM SIGGRAPH.
Google Scholar
Lucas, B.D. and Kanade, T. 1981. An iterative image registration technique with an application in stereo vision. In Seventh International Joint Conference on Artificial Intelligence (IJCAI-81). Vancouver, pp. 674-679.
Marr, D.C. and Poggio, T. 1979. A computational theory of human stereo vision. Proceedings of the Royal Society of London, B 204 301-328.
Google Scholar
Marroquin, J., Mitter, S., and Poggio, T. 1987. Probabilistic solution of ill-posed problems in computational vision. Journal of the American Statistical Association, 82(397):76-89.
Google Scholar
Matthies, L.H., Szeliski, R., and Kanade, T. 1989. Kalman filter-based algorithms for estimating depth from image sequences. International Journal of Computer Vision, 3:209-236.
Google Scholar
McMillan, L. and Bishop, G. 1995. Plenoptic modeling: An image-based rendering system. Computer Graphics (SIGGRAPH'95) pp. 39-46.
Nakamura, Y., Matsuura, T., Satoh, K., and Ohta, Y. 1996. Occlusion detectable stereo-occlusion patterns in camera matrix. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 371-378.
Ohta, Y. and Kanade, T. 1985. Stereo by intra-and inter-scanline search using dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7(2):139-154.
Google Scholar
Okutomi, M. and Kanade, T. 1992.A locally adaptive windowfor signal matching. International Journal of ComputerVision, 7(2):143-162.
Google Scholar
Okutomi, M. and Kanade, T. 1993. A multiple baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):353-363.
Google Scholar
Okutomi, M., Katayama, Y., and Oka, S. 2002. A simple stereo algorithm to recover precise object boundaries and smooth surfaces. International Journal of Computer Vision, 47(1-3):261-273.
Google Scholar
Poggio, T., Torre, V., and Koch, C. 1985. Computational vision and regularization theory. Nature, 317(6035):314-319.
Google Scholar
Pulli, K. et al. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Proceedings of the 8-th Eurographics Workshop on Rendering. St. Etienne, France.
Google Scholar
Quam, L.H. 1984. Hierarchical warp stereo. In Image Understanding Workshop. New Orleans, Louisiana, pp. 149-155, Science Applications International Corporation.
Google Scholar
Roy, S. and Cox, I.J. 1998. A maximum-flow formulation of the N-camera stereo correspondence problem. In Sixth International Conference on Computer Vision (ICCV'98). Bombay, pp. 492-499.
Saito, H. and Kanade, T. 1999. Shape reconstruction in projective grid space from large number of images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), vol. 2. Fort Collins, pp. 49-54.
Google Scholar
Satoh, K. and Ohta, Y. 1996. Occlusion detectable stereo—systematic comparison of detection algorithms. In 13th International Conference on Pattern Recognition (ICPR'96). Los Alamitos, pp. 280-286.
Sawhney, H.S. and Ayer, S. 1996. Compact representation of videos through dominant multiple motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8):814-830.
Google Scholar
Sawhney, H.S. and Hanson, A.R. 1991. Identification and 3D description of "shallow' environmental structure over a sequence of images. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'91). Maui, Hawaii, pp. 179-185, IEEE Computer Society Press.
Google Scholar
Scharstein, D. and Szeliski, R. 1998. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155-174.
Google Scholar
Scharstein, D. and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1):7-42.
Google Scholar
Seitz, S.M. and Dyer, C.M. 1999. Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision, 35(2):151-173.
Google Scholar
Shade, J., Gortler, S., He, L.-W., and Szeliski, R. 1998. Layered depth images. In Computer Graphics (SIGGRAPH'98) Proceedings. Orlando, pp. 231-242, ACM SIGGRAPH.
Shi, J. and Tomasi, C. 1994. Good features to track. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'94). Seattle,Washington, pp. 593-600, IEEE Computer Society.
Google Scholar
Shum, H.-Y. and He, L.-W. 1999. Rendering with concentric mosaics. In SIGGRAPH'99. Los Angeles, pp. 299-306, ACM SIGGRAPH.
Shum, H.-Y. and Szeliski, R. 1999. Stereo reconstruction from multiperspective panoramas. In Seventh International Conference on Computer Vision (ICCV'99). Kerkyra, Greece, pp. 14-21.
Sun, S., Haynor, D., and Kim, Y. 2000. Motion estimation based on optical flow with adaptive gradients. In International Conference on Image Processing (ICIP-2000), vol. I. Vancouver, pp. 852-855.
Google Scholar
Swaminathan, R., Kang, S.B., Szeliski, R., Criminisi, A., and Nayar, S.K. 2002. On the motion and appearance of specularities in image sequences. In Seventh European Conference on Computer Vision (ECCV 2002), Springer-Verlag, Copenhagen, pp. 508-523.
Google Scholar
Szeliski, R. 1999. A multi-view approach to motion and stereo. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'99), vol. 1. Fort Collins, pp. 157-163.
Google Scholar
Szeliski, R., Avidan, S., and Anandan, P. 2000. Layer extraction from multiple images containing reflections and transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2000), vol. 1. Hilton Head Island, pp. 246-253.
Google Scholar
Szeliski, R. and Coughlan, J. 1997. Hierarchical spline-based image registration. International Journal of ComputerVision, 22(3):199-218.
Google Scholar
Szeliski, R. and Golland, P. 1999. Stereo matching with transparency and matting. International Journal of Computer Vision, 32(1):45-61. Special Issue for Marr Prize papers.
Google Scholar
Szeliski, R. and Kang, S.B. 1995. Direct methods for visual scene reconstruction. In IEEE Workshop on Representations of Visual Scenes. Cambridge, Massachusetts, pp. 26-33.
Szeliski, R. and Zabih, R. 1999. An experimental comparison of stereo algorithms. In International Workshop on Vision Algorithms. Kerkyra, Greece, pp. 1-19, Springer.
Google Scholar
Tao, H., Sawhney, H., and Kumar,R. 2001.A global matching framework for stereo computation. In Eighth International Conference on Computer Vision (ICCV 2001), vol. I. Vancouver, Canada, pp. 532-539.
Google Scholar
Terzopoulos, D. 1986. Regularization of inverse visual problems involving discontinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(4):413-424.
Google Scholar
Tian, Q. and Huhns, M.N. 1986. Algorithms for subpixel registration. Computer Vision, Graphics, and Image Processing, 35:220-233.
Google Scholar
Torr, P.H.S., Szeliski, R., and Anandan, P. 2001. An integrated Bayesian approach to layer extraction from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):297-303.
Google Scholar
Tsin, Y., Kang, S.B., and Szeliski, R. 2003. Stereo matching with reflections and translucency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'2003), Madison, WI, pp. 702-709.
Veksler, O. 1999. Efficient graph-based energy minimization methods in computer vision. Ph.D. thesis, Cornell University.
Wang, J.Y.A. and Adelson, E.H. 1994. Representing moving images with layers. IEEE Transactions on Image Processing, 3(5):625-638.
Google Scholar
Weiss, Y. 1997. Smoothness in layers: Motion segmentation using nonparametric mixture estimation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97). San Juan, Puerto Rico, pp. 520-526.
Weiss, Y. and Adelson, E.H. 1996. A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). San Francisco, California, pp. 321-326.
Yang, Y., Yuille, A., and Lu, J. 1993. Local, global, and multilevel stereo matching. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'93). New York, New York, pp. 274-279, IEEE Computer Society.
Google Scholar
Zitnick, C.L. and Kanade, T. 2000.A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):675-684.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Sing Bing Kang & Richard Szeliski

Authors

Sing Bing Kang
View author publications
You can also search for this author in PubMed Google Scholar
Richard Szeliski
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, S.B., Szeliski, R. Extracting View-Dependent Depth Maps from a Collection of Images. International Journal of Computer Vision 58, 139–163 (2004). https://doi.org/10.1023/B:VISI.0000015917.35451.df

Download citation

Issue Date: July 2004
DOI: https://doi.org/10.1023/B:VISI.0000015917.35451.df

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting View-Dependent Depth Maps from a Collection of Images

Abstract

Access this article

Similar content being viewed by others

User Directed Multi-view-stereo

Match Selection and Refinement for Highly Accurate Two-View Structure from Motion

Stereo Matching—State-of-the-Art and Research Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Extracting View-Dependent Depth Maps from a Collection of Images

Abstract

Access this article

Similar content being viewed by others

User Directed Multi-view-stereo

Match Selection and Refinement for Highly Accurate Two-View Structure from Motion

Stereo Matching—State-of-the-Art and Research Challenges

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation