Boundary matting for view synthesis
Introduction
Although stereo correspondence was one of the first problems in computer vision to be extensively studied, automatically obtaining dense and accurate estimates of depth from multiple images remains a challenging problem [1].
Most stereo research has been concerned solely with methods for producing accurate depth maps, so interpolated views are rarely evaluated as results. By contrast, our explicit goal is superior view synthesis from stereo. Even for easy scenes in which all objects are opaque, diffuse, and well-textured, state-of-the-art stereo techniques often fail to generate high-quality interpolated views. Even if a perfect depth map were available, current methods for view interpolation share two major limitations:
- •
Sampling blur. There is an effective loss of resolution caused by resampling and blending the input views.
- •
Boundary artifacts. Foreground objects seem to pop out of the scene, as in bad blue-screen composites, because most current methods do not perform matting to resolve mixed pixels at object boundaries into their foreground and background components. (There are a few notable exceptions, as discussed in the next section.)
In this paper, we focus on the issue of boundary artifacts and propose a technique we call boundary matting to reduce such artifacts. Our technique, as outlined in Fig. 2, Fig. 3, combines ideas from image matting and stereo to resolve mixed boundary pixels. Our approach consists of estimating 3D curves over multiple views and uses stereo data to bootstrap this estimation.
The key feature of our approach is that occlusion boundaries are represented in 3D. This results in several improvements over the state of the art. First, compared to video matting [2] and other methods that recover pixel-level mattes for the input views [3], [4], [5], [6], our method is theoretically better suited to view synthesis, because it avoids the blurring associated with resampling those mattes (Fig. 1). Second, our method performs automatic matting from imperfect stereo data, fully incorporating multiple views, for large-scale opaque objects. Third, our method exploits information from matting to refine stereo disparities along occlusion boundaries. Fourth, our method estimates occlusion boundaries to sub-pixel accuracy, suitable for super-resolution or zooming. Fifth, our error metric is symmetric with respect to the input images, and so does not overly favor specific frames.
Our approach is based on several assumptions. First, we assume that the scene is made up of opaque Lambertian surfaces, i.e., surfaces that satisfy color constancy across the different input views. In practice, we can handle scenes that deviate somewhat from this assumption, treating non-Lambertian effects near object boundaries as noise. Moreover, we do not consider wide-baseline stereo configurations where these effects are most pronounced. Another important assumption is that the projected 2D boundaries correspond to the same 3D edge of an object. This is strictly true only for planar objects, however, this approximation is reasonable for small camera motion or relatively flat or distant objects (see Section 3.1).
Section snippets
Previous work
In their seminal blue-screen matting paper, Smith and Blinn [7] review traditional film-based matting techniques and propose a triangulation method for matting static foreground objects using multiple images taken with different backgrounds (see Section 3). More recent matting research has focused on natural image matting, where the goal is to estimate the matte from a single image, given regions hand-labelled as completely foreground and background [8], [9], [10], [11], [12], [4]. These
Image formation model
To model the matting effects at occlusion boundaries, we use the well-known compositing equation [7], [25]which describes the observed composite color C as a blend of the foreground color F and the background color B according to opacity α. The alpha matte is typically given at the pixel level, so fractional α’s may be due to partial pixel coverage of foreground objects at their boundaries or due to true semi-transparency. In this work, we focus exclusively on case where objects are
Initialization using stereo data
The starting point for boundary matting is an initialization derived from stereo and the attendant camera calibration. Boundary matting can use stereo data from any source; however, we chose to use results generated with [27] because its performance at occlusion boundaries was reasonable and an implementation was readily available. This method computes stereo by combining shiftable windows for matching with global minimization using graph cuts for visibility reasoning.
While initialization
Parameter optimization
Now that we have constructed the clean-plate background, B (Section 4.2), and obtained initial estimates for the parameters of each boundary curve, θ0 (Section 4.1), and the foreground colors, F0 (Section 4.3), we are in a position to refine these estimates to better fit the images.
Note that the objective function, Eq. (6), is highly non-linear, with bilinearity in the variables, perspective projection, and a complicated form for alpha as the partial pixel coverage of a projected spline
Results
For all datasets, we used five input views, with the middle view designated as the reference view for initialization. While our prototype system was not designed for efficiency, a typical run for a 300-pixel boundary in five views could take approximately five minutes to complete, converging within 20 iterations.
For our first experiment, we used a synthetic dataset (448 × 336 pixels), consisting of a planar ellipse-shaped sprite with pure translation relative to the background, to investigate the
Concluding remarks
For seamless view interpolation, mixed boundary pixels must be resolved into foreground and background components. Boundary matting appears to be a useful tool for addressing this problem in an automatic way. Using 3D curves to model occlusion boundaries is a natural representation that provides several benefits, including the ability to super-resolve the depth maps near occlusion boundaries.
A current limitation of our approach is its lack of reasoning about color statistics, which has proven
References (32)
- et al.
A taxonomy and evaluation of dense two-frame, stereo correspondence algorithms
Int. J. Comput. Vision
(2002) - Y.-Y. Chuang, A. Agarwala, B. Curless, D.H. Salesin, R. Szeliski, Video matting of complex scenes, in: Proc. ACM...
- Y. Wexler, A.W. Fitzgibbon, A. Zisserman, Bayesian estimation of layers from multiple images, in: Proc. ECCV, vol. 3,...
- H.-Y. Shum, J. Sun, S. Yamazaki, Y. Li, C.-K. Tang, Pop-up light field: an interactive image-based modeling and...
- C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, R. Szeliski, High-quality video view interpolation using a layered...
- R. Szeliski, P. Golland, Stereo matching with transparency and matting, in: Proc. ICCV, 1998, pp....
- A.R. Smith, J.F. Blinn, Blue screen matting, in: Proc. ACM SIGGRAPH, 1996, pp....
- M. Ruzon, C. Tomasi, Alpha estimation in natural images, in: Proc. CVPR, 2000, pp....
- Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: Proc. CVPR, 2001, pp....
- P. Hillman, J. Hannah, D. Renshaw, Alpha channel estimation in high resolution image and image sequences, in: Proc....
Computing occluding and transparent motions
Int. J. Comput. Vision
Cited by (32)
End-effects mitigation in empirical mode decomposition using a new correlation-based expansion model
2023, Mechanical Systems and Signal ProcessingDisparity map estimation and view synthesis using temporally adaptive triangular meshes
2017, Computers and Graphics (Pergamon)Citation Excerpt :We also intend to explore recent results on motion boundary detection [38] to improve the temporal mesh evolution. Another future direction is to reduce the artifacts generated at the disparity discontinuities of the mesh using boundary matting techniques (e.g. [39]). Finally, the temporally coherent meshes could be used for 3D data compression, as proposed by Collet and colleagues [19].
Three-layer graph framework with the sumD feature for alpha matting
2017, Computer Vision and Image UnderstandingA flexible architecture for multi-view 3DTV based on uncalibrated cameras
2014, Journal of Visual Communication and Image RepresentationCitation Excerpt :The effect of Joint Projection Filling (JPF) and depth-aided inpainting algorithm [26] is shown in Fig. 18(m). Following McMillan [27] pixel scanning order to perform filling during the projection, though preserves the connectivity but annoying ghosting, stretching and blurring artifacts are apparently visible inside the large disocclusion areas and around the object boundaries (Fig. 18(m)). It works only to recover the small cracks (one pixel wide) by interpolating the neighboring pixels.
User assisted disparity remapping for stereo images
2013, Signal Processing: Image CommunicationCitation Excerpt :Object borders are usually difficult to define with the pixel precision. Therefore, image matting methods have been proposed to overcome such image synthesis tasks [43]. In our study, we addressed this problem by smoothing the disparity at the object boundaries.