Abstract
A fundamental problem in computer vision and graphics is that of arbitrary view synthesis for static 3-D scenes, whereby a user-specified viewpoint of the given scene may be created directly from a representation. We propose a novel compact representation for this purpose called the multivalued representation (MVR). Starting with an image sequence captured by a moving camera undergoing either unknown planar translation or orbital motion, a MVR is derived for each preselected reference frame, and may then be used to synthesize arbitrary views of the scene. The representation itself is comprised of multiple depth and intensity levels in which the k-th level consists of points occluded by exactly k surfaces. To build a MVR with respect to a particular reference frame, dense depth maps are first computed for all the neighboring frames of the reference frame. The depth maps are then combined together into a single map, where points are organized by occlusions rather than by coherent affine motions. This grouping facilitates an automatic process to determine the number of levels and helps to reduce the artifacts caused by occlusions in the scene. An iterative multiframe algorithm is presented for dense depth estimation that both handles low-contrast regions and produces piecewise smooth depth maps. Reconstructed views as well as arbitrary flyarounds of real scenes are presented to demonstrate the effectiveness of the approach.
Similar content being viewed by others
References
Anandan, P. 1984. Computing dense displacement fields with confi-dence measures in scenes containing occlusion. In Proceedings of the SPIE: Intelligent Robots and Computer Vision, 5–8 November, Vol. 521, Cambridge, MA, pp. 184–194.
Anandan, P., Bergen, J.R., Hanna, K.J., and Hingorani, R. 1993. Hierarchical model-based motion estimation. In Motion Analysis and Image Sequence Processing, Chap. 1, M.I. Sezan and R.L. Lagendijk (Eds.). Kluwer Academic Publishers.
Baker, S., Szeliski, R., and Anandan, P. 1998. A layered approach to stereo reconstruction. In Proceedings of CVPR, Santa Barbara, CA, pp. 434–441.
Chang, N.L. 1994. View reconstruction from uncalibrated cameras for three-dimensional scenes. Master's Thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley.
Chang, N.L. 1999. Depth-based representations of three-dimensional scenes for view synthesis. Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley. URL: www.video.eecs.berkeley.edu/∼nlachang/MVR.
Chang, N.L. and Zakhor, A. 1997a. Multivalued representations for image reconstruction and new view synthesis. Qualifying Examination Proposal, University of California at Berkeley. Also Technical Report, Video and Image Processing Lab, Feb. 1997.
Chang, N.L. and Zakhor, A. 1997b. View generation for threedimensional scenes from video sequences. IEEE Trans.on Image Proc., 6(4):584–598.
Chang, N.L. and Zakhor, A. 1998. Finite sensor effects for estimating structure-from-motion. In Proceedings of ICIP, 5–8 October, Vol. 1, Chicago, IL, pp. 918–922.
Chang, N.L. and Zakhor, A. 1999. A multivalued representation for view synthesis. In Proceedings of ICIP (Invited paper), 25–28 October, Vol. 2, Kobe, Japan, pp. 505–509.
Chen, S.E. and Williams, L. 1993. View interpolation for image synthesis. In Proceedings of SIGGRAPH, 1–6 August, New York, NY, pp. 279–288.
Cox, I.J., Hingorani, S., Maggs, B.M., and Rao, S.B. 1992. Stereo without disparity gradient smoothing: A bayesian sensor fusion solution. In Proceedings of BMVC, 22–24 September, Leeds, UK, pp. 337–346.
Darrell, T. and Pentland, A.P. 1995. Cooperative robust estimation using layers of support. IEEE Trans.on Patt.Anal.Mach.Intell., 17(5):474–487.
Debevec, P.E. 1996. Modeling and rendering architecture from photographs. Ph.D. Thesis, Computer Sciences Division, University of California at Berkeley.
Falkenhagen, L. 1994. Depth estimation from stereoscopic image pairs assuming piecewise continuous surfaces. In Workshops in Computing, Image Processing for Broadcast and Video Production, Hamburg, pp. 115–127.
Faugeras, O.D. 1994. Three-Dimensional Computer Vision. MIT Press: Cambridge, MA.
Fua, P. 1993. A parallel stereo algorithm that produces dense depth maps and preserves image features. Machine Vision and Applications, 6(1):35–49.
Gortler, S.J., Grzeszczuk, R., Szeliski, R., and Cohen, M.F. 1996. The lumigraph. In Proceedings of SIGGRAPH, 4–9 August, New Orleans, LA, pp. 43–54.
Hartley, R.I. 1997. In defense of the eight-point algorithm. IEEE Trans.on Patt.Anal.Mach.Intell., 19(6):580–593.
Haralick, R.M. and Shapiro, L.G. 1985. Image segmentation techniques. Computer Vision, Graphics, and Image Processing, 29(1):100–132.
Intille, S.S. and Bobick, A.F. 1994. Disparity-space images and large occlusion stereo. Technical Report 220, MIT Media Lab Perceptual Computing Group.
Kanade, T., Rander, P.W., and Narayanan, P.J. 1997. Virtualized reality: Constructing virtual worlds from real scenes. IEEE Multimedia, 4(1):34–47.
Kang, S.B. and Szeliski, R. 1997. 3-d scene data recovery using omnidirectional multibaseline stereo. International Journal of Computer Vision, 25(2):167–183.
Koch, R. 1993. Automatic reconstruction of buildings from stereoscopic image sequences. In Proceedings of EUROGRAPHICS, 6–10 September, Barcelona, Spain, pp. 339–350.
Laveau, S. and Faugeras, O. 1994. 3-D scene representation as a collection of images and fundamental matrices. Technical Report 2205, INRIA.
Levoy, M. and Hanrahan, P. 1996. Light field rendering. In Proceedings of SIGGRAPH, 4–9 August, New Orleans, LA, pp. 31–42.
Lim, J.S. 1990. Two-Dimensional Signal and Image Processing. Prentice-Hall: Englewood Cliffs, NJ.
Longuet-Higgins, H.C. 1981. A computer algorithm for reconstructing a scene from two projections.Nature, 293(5828):133–135.
Matthies, L., Kanade, T., and Szeliski, R. 1989. Kalman filter-based algorithms for estimating depth from image sequences. International Journal of Computer Vision, 3(3):209–238.
Maybank, S. 1993. Theory of Reconstruction from Image Motion. Spring-Verlag: Berlin.
McMillan, L. 1995. A list-priority rendering algorithm for redisplaying projected surfaces. Technical Report 95–005, University of North Carolina.
McMillan, L. and Bishop, G. 1995. Plenoptic modeling: An imagebased rendering system. In Proceedings of SIGGRAPH, 6–11 August, Los Angeles, CA, pp. 39–46.
Meier, T. and Ngan, K.N. 1998. Automatic segmentation of moving objects for video object plane generation. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):525–538.
Murray, R.M., Li, Z., and Sastry, S.S. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press: Boca Raton.
Ohta, Y. and Kanade, T. 1985. Stereo by intra-and inter-scanline search using dynamic programming. IEEE Trans.Pattern Anal.Mach.Intell., PAMI-7(2):139–154.
Okutomi, M. and Kanade, T. 1993. A multiple-baseline stereo. IEEE Trans.on Patt.Anal.Mach.Intell., 15(4):353–363.
Pal, N.R. and Pal, S.K. 1993. A review on image segmentation techniques. Pattern Recognition, 26(9):1277–1294.
Rousseeuw, P.J. and Leroy, A.M. 1987. Robust Regression and Outlier Detection. Wiley: New York.
Sawhney, H.S. and Ayer, S. 1996. Compact representations of videos through dominant and multiple motion estimation. IEEE Trans.on Patt.Anal.Mach.Intell., 18(8):814–830.
Seitz, S.M. and Dyer, C.R. 1996. View morphing. In Proceedings of SIGGRAPH, 4–9 August, New Orleans, LA, pp. 21–30.
Shade, J., Gortler, S., He, L.W., and Szeliski, R. 1998. Layered depth images. In Proceedings of SIGGRAPH. Orlando, FL.
Shi, J., Belongie, S., Leung, T., and Malik, J. 1998. Image and video segmentation: The normalized cut framework. In Proceedings of ICIP, Vol. 1, 4–7 October, Chicago, IL, pp. 943–947.
Shum, H.Y., Ikeuchi, K., and Reddy, R. 1995. Principal component analysis with missing data and its application to polyhedral object modeling. IEEE Trans.on Pattern Anal.Mach.Intell., 17(9):854–867.
Shum, H.Y., Han, M., and Szeliski, R. 1998. Interactive construction of 3-d models from panoramic mosaics. In Proceedings of CVPR, 23–25 June, Santa Barbara, CA, pp. 427–433.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization. International Journal of Computer Vision, 9(2):137–154.
Tsai, R.Y. 1987. A versatile camera calibration technique for highaccuracy 3-d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE Journal of Robotics and Automation, RA–3(4):323–344.
Tsai, R.Y. and Huang, T.S. 1984. Uniqueness and estimation of threedimensional motion parameters of rigid objects with curved surfaces. IEEE Trans.on Patt.Anal.Mach.Intel., PAMI–6(1):13–27.
Vass, J., Palaniappan, K., and Zhuang, X. 1998. Automatic spatiotemporal video sequence segmentation. In Proceedings of ICIP, Vol. 1, 4–7 October, Chicago, IL, pp. 958–962.
Wang, D. 1998. Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):539–546.
Wang, J.Y.A. and Adelson, E.H. 1994. Representing moving images with layers. IEEE Trans.on Image Proc., 3(5):625–638.
Weiss, Y. and Adelson, E.H. 1996. A unified mixture framework for motion segmentation: Incorporating spatial coherence and estimating the number of models. In Proceedings of CVPR, 18–20 June, San Francisco, CA, pp. 321–326.
Zhang, Z., Deriche, R., Faugeras, O.D., and Luong, Q.T. 1995. A robust technique for matching two uncalibrated images through the recovery of the unknownepipolar geometry. Artificial Intelligence, 78(1/2):87–119.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chang, N.L., Zakhor, A. Constructing a Multivalued Representation for View Synthesis. International Journal of Computer Vision 45, 157–190 (2001). https://doi.org/10.1023/A:1012476031602
Issue Date:
DOI: https://doi.org/10.1023/A:1012476031602