Boundary matting for view synthesis

https://doi.org/10.1016/j.cviu.2006.02.005Get rights and content

Abstract

In the last few years, new view synthesis has emerged as an important application of 3D stereo reconstruction. While the quality of stereo has improved, it is still imperfect, and a unique depth is typically assigned to every pixel. This is problematic at object boundaries, where the pixel colors are mixtures of foreground and background colors. Interpolating views without explicitly accounting for this effect results in objects with a “cut-out” appearance. To produce seamless view interpolation, we propose a method called boundary matting, which represents each occlusion boundary as a 3D curve. We show how this method exploits multiple views to perform fully automatic alpha matting and to simultaneously refine stereo depths at the boundaries. The key to our approach is the 3D representation of occlusion boundaries estimated to sub-pixel accuracy. Starting from an initial estimate derived from stereo, we optimize the curve parameters and the foreground colors near the boundaries. Our objective function maximizes consistency with the input images, favors boundaries aligned with strong edges, and damps large perturbations of the curves. Experimental results suggest that this method enables high-quality view synthesis with reduced matting artifacts.

Introduction

Although stereo correspondence was one of the first problems in computer vision to be extensively studied, automatically obtaining dense and accurate estimates of depth from multiple images remains a challenging problem [1].

Most stereo research has been concerned solely with methods for producing accurate depth maps, so interpolated views are rarely evaluated as results. By contrast, our explicit goal is superior view synthesis from stereo. Even for easy scenes in which all objects are opaque, diffuse, and well-textured, state-of-the-art stereo techniques often fail to generate high-quality interpolated views. Even if a perfect depth map were available, current methods for view interpolation share two major limitations:

  • Sampling blur. There is an effective loss of resolution caused by resampling and blending the input views.

  • Boundary artifacts. Foreground objects seem to pop out of the scene, as in bad blue-screen composites, because most current methods do not perform matting to resolve mixed pixels at object boundaries into their foreground and background components. (There are a few notable exceptions, as discussed in the next section.)

In this paper, we focus on the issue of boundary artifacts and propose a technique we call boundary matting to reduce such artifacts. Our technique, as outlined in Fig. 2, Fig. 3, combines ideas from image matting and stereo to resolve mixed boundary pixels. Our approach consists of estimating 3D curves over multiple views and uses stereo data to bootstrap this estimation.

The key feature of our approach is that occlusion boundaries are represented in 3D. This results in several improvements over the state of the art. First, compared to video matting [2] and other methods that recover pixel-level mattes for the input views [3], [4], [5], [6], our method is theoretically better suited to view synthesis, because it avoids the blurring associated with resampling those mattes (Fig. 1). Second, our method performs automatic matting from imperfect stereo data, fully incorporating multiple views, for large-scale opaque objects. Third, our method exploits information from matting to refine stereo disparities along occlusion boundaries. Fourth, our method estimates occlusion boundaries to sub-pixel accuracy, suitable for super-resolution or zooming. Fifth, our error metric is symmetric with respect to the input images, and so does not overly favor specific frames.

Our approach is based on several assumptions. First, we assume that the scene is made up of opaque Lambertian surfaces, i.e., surfaces that satisfy color constancy across the different input views. In practice, we can handle scenes that deviate somewhat from this assumption, treating non-Lambertian effects near object boundaries as noise. Moreover, we do not consider wide-baseline stereo configurations where these effects are most pronounced. Another important assumption is that the projected 2D boundaries correspond to the same 3D edge of an object. This is strictly true only for planar objects, however, this approximation is reasonable for small camera motion or relatively flat or distant objects (see Section 3.1).

Section snippets

Previous work

In their seminal blue-screen matting paper, Smith and Blinn [7] review traditional film-based matting techniques and propose a triangulation method for matting static foreground objects using multiple images taken with different backgrounds (see Section 3). More recent matting research has focused on natural image matting, where the goal is to estimate the matte from a single image, given regions hand-labelled as completely foreground and background [8], [9], [10], [11], [12], [4]. These

Image formation model

To model the matting effects at occlusion boundaries, we use the well-known compositing equation [7], [25]C=αF+(1-α)Bwhich describes the observed composite color C as a blend of the foreground color F and the background color B according to opacity α. The alpha matte is typically given at the pixel level, so fractional α’s may be due to partial pixel coverage of foreground objects at their boundaries or due to true semi-transparency. In this work, we focus exclusively on case where objects are

Initialization using stereo data

The starting point for boundary matting is an initialization derived from stereo and the attendant camera calibration. Boundary matting can use stereo data from any source; however, we chose to use results generated with [27] because its performance at occlusion boundaries was reasonable and an implementation was readily available. This method computes stereo by combining shiftable windows for matching with global minimization using graph cuts for visibility reasoning.

While initialization

Parameter optimization

Now that we have constructed the clean-plate background, B (Section 4.2), and obtained initial estimates for the parameters of each boundary curve, θ0 (Section 4.1), and the foreground colors, F0 (Section 4.3), we are in a position to refine these estimates to better fit the images.

Note that the objective function, Eq. (6), is highly non-linear, with bilinearity in the variables, perspective projection, and a complicated form for alpha as the partial pixel coverage of a projected spline

Results

For all datasets, we used five input views, with the middle view designated as the reference view for initialization. While our prototype system was not designed for efficiency, a typical run for a 300-pixel boundary in five views could take approximately five minutes to complete, converging within 20 iterations.

For our first experiment, we used a synthetic dataset (448 × 336 pixels), consisting of a planar ellipse-shaped sprite with pure translation relative to the background, to investigate the

Concluding remarks

For seamless view interpolation, mixed boundary pixels must be resolved into foreground and background components. Boundary matting appears to be a useful tool for addressing this problem in an automatic way. Using 3D curves to model occlusion boundaries is a natural representation that provides several benefits, including the ability to super-resolve the depth maps near occlusion boundaries.

A current limitation of our approach is its lack of reasoning about color statistics, which has proven

References (32)

  • D. Scharstein et al.

    A taxonomy and evaluation of dense two-frame, stereo correspondence algorithms

    Int. J. Comput. Vision

    (2002)
  • Y.-Y. Chuang, A. Agarwala, B. Curless, D.H. Salesin, R. Szeliski, Video matting of complex scenes, in: Proc. ACM...
  • Y. Wexler, A.W. Fitzgibbon, A. Zisserman, Bayesian estimation of layers from multiple images, in: Proc. ECCV, vol. 3,...
  • H.-Y. Shum, J. Sun, S. Yamazaki, Y. Li, C.-K. Tang, Pop-up light field: an interactive image-based modeling and...
  • C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, R. Szeliski, High-quality video view interpolation using a layered...
  • R. Szeliski, P. Golland, Stereo matching with transparency and matting, in: Proc. ICCV, 1998, pp....
  • A.R. Smith, J.F. Blinn, Blue screen matting, in: Proc. ACM SIGGRAPH, 1996, pp....
  • M. Ruzon, C. Tomasi, Alpha estimation in natural images, in: Proc. CVPR, 2000, pp....
  • Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: Proc. CVPR, 2001, pp....
  • P. Hillman, J. Hannah, D. Renshaw, Alpha channel estimation in high resolution image and image sequences, in: Proc....
  • C. Rother, A. Blake, V. Kolmogorov, “GrabCut”—Interactive foreground extraction using iterated graph cuts, in: Proc....
  • J. Sun, J. Jia, C.-K. Tang, H.-Y. Shum, Poisson matting, in: Proc. ACM SIGGRAPH, 2004, pp....
  • R. Szeliski, S. Avidan, P. Anandan, Layer extraction from multiple images containing reflections and transparency, in:...
  • Y. Tsin, S. Kang, R. Szeliski, Stereo matching with reflections and translucency, in: Proc. CVPR, 2003, pp....
  • M. Irani et al.

    Computing occluding and transparent motions

    Int. J. Comput. Vision

    (1994)
  • J.D. Bonet, P. Viola, Roxels: responsibility weighted 3D volume reconstruction, in: Proc. ICCV, 1999, pp....
  • Cited by (32)

    • Disparity map estimation and view synthesis using temporally adaptive triangular meshes

      2017, Computers and Graphics (Pergamon)
      Citation Excerpt :

      We also intend to explore recent results on motion boundary detection [38] to improve the temporal mesh evolution. Another future direction is to reduce the artifacts generated at the disparity discontinuities of the mesh using boundary matting techniques (e.g. [39]). Finally, the temporally coherent meshes could be used for 3D data compression, as proposed by Collet and colleagues [19].

    • Three-layer graph framework with the sumD feature for alpha matting

      2017, Computer Vision and Image Understanding
    • A flexible architecture for multi-view 3DTV based on uncalibrated cameras

      2014, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      The effect of Joint Projection Filling (JPF) and depth-aided inpainting algorithm [26] is shown in Fig. 18(m). Following McMillan [27] pixel scanning order to perform filling during the projection, though preserves the connectivity but annoying ghosting, stretching and blurring artifacts are apparently visible inside the large disocclusion areas and around the object boundaries (Fig. 18(m)). It works only to recover the small cracks (one pixel wide) by interpolating the neighboring pixels.

    • User assisted disparity remapping for stereo images

      2013, Signal Processing: Image Communication
      Citation Excerpt :

      Object borders are usually difficult to define with the pixel precision. Therefore, image matting methods have been proposed to overcome such image synthesis tasks [43]. In our study, we addressed this problem by smoothing the disparity at the object boundaries.

    View all citing articles on Scopus
    View full text