Boundary matting for view synthesis

doi:10.1016/j.cviu.2006.02.005

Computer Vision and Image Understanding

Volume 103, Issue 1, July 2006, Pages 22-32

https://doi.org/10.1016/j.cviu.2006.02.005 Get rights and content

Abstract

In the last few years, new view synthesis has emerged as an important application of 3D stereo reconstruction. While the quality of stereo has improved, it is still imperfect, and a unique depth is typically assigned to every pixel. This is problematic at object boundaries, where the pixel colors are mixtures of foreground and background colors. Interpolating views without explicitly accounting for this effect results in objects with a “cut-out” appearance. To produce seamless view interpolation, we propose a method called boundary matting, which represents each occlusion boundary as a 3D curve. We show how this method exploits multiple views to perform fully automatic alpha matting and to simultaneously refine stereo depths at the boundaries. The key to our approach is the 3D representation of occlusion boundaries estimated to sub-pixel accuracy. Starting from an initial estimate derived from stereo, we optimize the curve parameters and the foreground colors near the boundaries. Our objective function maximizes consistency with the input images, favors boundaries aligned with strong edges, and damps large perturbations of the curves. Experimental results suggest that this method enables high-quality view synthesis with reduced matting artifacts.

Introduction

Although stereo correspondence was one of the first problems in computer vision to be extensively studied, automatically obtaining dense and accurate estimates of depth from multiple images remains a challenging problem [1].

Most stereo research has been concerned solely with methods for producing accurate depth maps, so interpolated views are rarely evaluated as results. By contrast, our explicit goal is superior view synthesis from stereo. Even for easy scenes in which all objects are opaque, diffuse, and well-textured, state-of-the-art stereo techniques often fail to generate high-quality interpolated views. Even if a perfect depth map were available, current methods for view interpolation share two major limitations:

•
Sampling blur. There is an effective loss of resolution caused by resampling and blending the input views.
•
Boundary artifacts. Foreground objects seem to pop out of the scene, as in bad blue-screen composites, because most current methods do not perform matting to resolve mixed pixels at object boundaries into their foreground and background components. (There are a few notable exceptions, as discussed in the next section.)

In this paper, we focus on the issue of boundary artifacts and propose a technique we call boundary matting to reduce such artifacts. Our technique, as outlined in Fig. 2, Fig. 3, combines ideas from image matting and stereo to resolve mixed boundary pixels. Our approach consists of estimating 3D curves over multiple views and uses stereo data to bootstrap this estimation.

The key feature of our approach is that occlusion boundaries are represented in 3D. This results in several improvements over the state of the art. First, compared to video matting [2] and other methods that recover pixel-level mattes for the input views [3], [4], [5], [6], our method is theoretically better suited to view synthesis, because it avoids the blurring associated with resampling those mattes (Fig. 1). Second, our method performs automatic matting from imperfect stereo data, fully incorporating multiple views, for large-scale opaque objects. Third, our method exploits information from matting to refine stereo disparities along occlusion boundaries. Fourth, our method estimates occlusion boundaries to sub-pixel accuracy, suitable for super-resolution or zooming. Fifth, our error metric is symmetric with respect to the input images, and so does not overly favor specific frames.

Our approach is based on several assumptions. First, we assume that the scene is made up of opaque Lambertian surfaces, i.e., surfaces that satisfy color constancy across the different input views. In practice, we can handle scenes that deviate somewhat from this assumption, treating non-Lambertian effects near object boundaries as noise. Moreover, we do not consider wide-baseline stereo configurations where these effects are most pronounced. Another important assumption is that the projected 2D boundaries correspond to the same 3D edge of an object. This is strictly true only for planar objects, however, this approximation is reasonable for small camera motion or relatively flat or distant objects (see Section 3.1).

Section snippets

Previous work

In their seminal blue-screen matting paper, Smith and Blinn [7] review traditional film-based matting techniques and propose a triangulation method for matting static foreground objects using multiple images taken with different backgrounds (see Section 3). More recent matting research has focused on natural image matting, where the goal is to estimate the matte from a single image, given regions hand-labelled as completely foreground and background [8], [9], [10], [11], [12], [4]. These

Image formation model

To model the matting effects at occlusion boundaries, we use the well-known compositing equation [7], [25] $C = α F + (1 - α) B$ which describes the observed composite color C as a blend of the foreground color F and the background color B according to opacity α. The alpha matte is typically given at the pixel level, so fractional α’s may be due to partial pixel coverage of foreground objects at their boundaries or due to true semi-transparency. In this work, we focus exclusively on case where objects are

Initialization using stereo data

The starting point for boundary matting is an initialization derived from stereo and the attendant camera calibration. Boundary matting can use stereo data from any source; however, we chose to use results generated with [27] because its performance at occlusion boundaries was reasonable and an implementation was readily available. This method computes stereo by combining shiftable windows for matching with global minimization using graph cuts for visibility reasoning.

While initialization

Parameter optimization

Now that we have constructed the clean-plate background, B (Section 4.2), and obtained initial estimates for the parameters of each boundary curve, θ⁰ (Section 4.1), and the foreground colors, F⁰ (Section 4.3), we are in a position to refine these estimates to better fit the images.

Note that the objective function, Eq. (6), is highly non-linear, with bilinearity in the variables, perspective projection, and a complicated form for alpha as the partial pixel coverage of a projected spline

Results

For all datasets, we used five input views, with the middle view designated as the reference view for initialization. While our prototype system was not designed for efficiency, a typical run for a 300-pixel boundary in five views could take approximately five minutes to complete, converging within 20 iterations.

For our first experiment, we used a synthetic dataset (448 × 336 pixels), consisting of a planar ellipse-shaped sprite with pure translation relative to the background, to investigate the

Concluding remarks

For seamless view interpolation, mixed boundary pixels must be resolved into foreground and background components. Boundary matting appears to be a useful tool for addressing this problem in an automatic way. Using 3D curves to model occlusion boundaries is a natural representation that provides several benefits, including the ability to super-resolve the depth maps near occlusion boundaries.

A current limitation of our approach is its lack of reasoning about color statistics, which has proven

References (32)

D. Scharstein et al.
A taxonomy and evaluation of dense two-frame, stereo correspondence algorithms
Int. J. Comput. Vision
(2002)
Y.-Y. Chuang, A. Agarwala, B. Curless, D.H. Salesin, R. Szeliski, Video matting of complex scenes, in: Proc. ACM...
Y. Wexler, A.W. Fitzgibbon, A. Zisserman, Bayesian estimation of layers from multiple images, in: Proc. ECCV, vol. 3,...
H.-Y. Shum, J. Sun, S. Yamazaki, Y. Li, C.-K. Tang, Pop-up light field: an interactive image-based modeling and...
C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, R. Szeliski, High-quality video view interpolation using a layered...
R. Szeliski, P. Golland, Stereo matching with transparency and matting, in: Proc. ICCV, 1998, pp....
A.R. Smith, J.F. Blinn, Blue screen matting, in: Proc. ACM SIGGRAPH, 1996, pp....
M. Ruzon, C. Tomasi, Alpha estimation in natural images, in: Proc. CVPR, 2000, pp....
Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: Proc. CVPR, 2001, pp....
P. Hillman, J. Hannah, D. Renshaw, Alpha channel estimation in high resolution image and image sequences, in: Proc....

C. Rother, A. Blake, V. Kolmogorov, “GrabCut”—Interactive foreground extraction using iterated graph cuts, in: Proc....

J. Sun, J. Jia, C.-K. Tang, H.-Y. Shum, Poisson matting, in: Proc. ACM SIGGRAPH, 2004, pp....

R. Szeliski, S. Avidan, P. Anandan, Layer extraction from multiple images containing reflections and transparency, in:...

Y. Tsin, S. Kang, R. Szeliski, Stereo matching with reflections and translucency, in: Proc. CVPR, 2003, pp....

M. Irani et al.

Computing occluding and transparent motions

Int. J. Comput. Vision

(1994)

J.D. Bonet, P. Viola, Roxels: responsibility weighted 3D volume reconstruction, in: Proc. ICCV, 1999, pp....

Cited by (32)

End-effects mitigation in empirical mode decomposition using a new correlation-based expansion model
2023, Mechanical Systems and Signal Processing
The real-time analysis of time-varying data has become extremely significant in mechanical systems. The Hilbert–Huang transform (HHT) is a recent development in signal processing based on time–frequency distribution for analyzing complex signals. The HHT permits instantaneous attributes to be used to study signals that display complex behavior and stochastically vary with time. This transform combines empirical mode decomposition (EMD) and Hilbert transform (HT). The EMD converts a signal of time into various intrinsic mode functions (IMF), which are subsequently used by the HT to insert into the same time–frequency space. However, the restriction of the end-effect is an important problem when employing the EMD method. End-effect causes several issues such as changing the shapes of envelopes, deviating IMF, computational instabilities at the signal borders, and modal aliasing. To mitigate the end-effects, this study presents a correlation-based expansion model at both ends of the signal. Motivated by end-effects mitigation, data outside of the signal boundaries are predicted based on the similarity of the signal’s start and end segments with the signal's inner segment. Correlation techniques are a reliable method to find signal pattern similarities. Seven simulated signals are used to verify the method's performance. The seven signals include six monovarietal signals with various amplitude modulation-frequency modulation behaviors and a one-channel functional near-infrared spectroscopy (fNIRS) signal. The comparison results of the proposed technique with standard EMD and EMD coupled with other expansion methods show that the end-effect in the EMD method by correlation expansion can effectively reduce.
A novel hybrid feature extraction approach of marine vessel signal via improved empirical mode decomposition and measuring complexity
2023, Ocean Engineering
The feature extraction of marine vessel-radiated noise (MVRN) under the complex ocean background is explored. To this end, a hybrid approach is presented based on the analysis of MVRN in subspaces of intrinsic mode functions (IMF) extracted using the improved empirical mode decomposition (IEMD) and measuring complexity. The restriction of the end-effect is an important problem when employing the EMD algorithm. In this study, first, to reduce the end-effects, an IEMD algorithm based on the correlation expansion model is proposed. Then, a comparative study of IEMD, classic EMD (CEMD), and EMD by other expansion methods are conducted on several signals. Next, IEMD, CEMD, and variational mode decomposition (VMD) algorithms are utilized to extract a group of IMFs for three types of MVRN. Later, one obtained IMF from each method that contains the most dominant information is selected. Lastly, two statistical complexity measures (i.e., permutation entropy (PE) and slope entropy (SlopEn)) are used as the features of the chosen IMF to improve the underwater signal separability and stability. Experimental results indicate that the suggested approach (IEMD-PE/SlopEn) can effectively extract the feature information of underwater signals. Additionally, it has a better ability to discriminate between various kinds of MVRN.
Disparity map estimation and view synthesis using temporally adaptive triangular meshes
2017, Computers and Graphics (Pergamon)
Citation Excerpt :
We also intend to explore recent results on motion boundary detection [38] to improve the temporal mesh evolution. Another future direction is to reduce the artifacts generated at the disparity discontinuities of the mesh using boundary matting techniques (e.g. [39]). Finally, the temporally coherent meshes could be used for 3D data compression, as proposed by Collet and colleagues [19].
This paper presents a new method for spatio-temporally coherent disparity map estimation and view interpolation for multiview linear camera arrays based on 2D domain triangulation. In the first frame of the sequence, a 3D mesh is computed for each camera, leading to a spatially coherent view interpolation. For the remaining frames of the sequence, a new scheme is proposed to update the 3D mesh dynamically, by moving, deleting and inserting vertices based on the optical flow and the previously computed disparity map. With this approach it is possible to relate triangles of the mesh across time, and a combination of Hidden Markov Models (HMMs) applied to time-persistent triangles with the Kalman Filter applied to the vertices produces temporally coherent disparity maps and interpolated views. Experimental results indicate that our approach was able to generate visually coherent in-between interpolated views for challenging, real-world videos with natural lighting and camera movement. Also, quantitative evaluations using objective video quality metrics show that our interpolated videos are typically better than competitive approaches.
Three-layer graph framework with the sumD feature for alpha matting
2017, Computer Vision and Image Understanding
Alpha matting, the process of extracting opacity mask of the foreground in an image, is an important task in image and video editing. All of the matting methods need exploit the relationships between pixels. The traditional propagation-based methods construct constrains based on nonlocal principle and color line model to reflect the relationships. However, these methods would produce artifacts if the constrains are not reliable. So we improve this problem in three points. Firstly, we design a novel feature called sumD feature to increase the pixel discrimination. This feature is simple and could encourage pixels with similar texture to have similar feature values. Secondly, we design a three-layer graph framework to construct nonlocal constrains. This framework finds constrains in multi-scale range and selects reliable constrains, then unifies nonlocal constrains according to their reliabilities. Thirdly, we develop a new label extension method to add hard constrains. Experimental results confirm that the effectiveness of the three changes, and the proposed method achieves high rank on the benchmark dataset.
A flexible architecture for multi-view 3DTV based on uncalibrated cameras
2014, Journal of Visual Communication and Image Representation
Citation Excerpt :
The effect of Joint Projection Filling (JPF) and depth-aided inpainting algorithm [26] is shown in Fig. 18(m). Following McMillan [27] pixel scanning order to perform filling during the projection, though preserves the connectivity but annoying ghosting, stretching and blurring artifacts are apparently visible inside the large disocclusion areas and around the object boundaries (Fig. 18(m)). It works only to recover the small cracks (one pixel wide) by interpolating the neighboring pixels.
This paper presents a novel flexible architecture for 3DTV based on multiple uncalibrated cameras. The proposed signal representation improves the interactivity of dense point-based methods, making them appropriate for modeling the scene semantics and free-viewpoint 3DTV applications. The main concern is to address the shortcomings of depth image-based 3D video systems for free-viewpoint visualization, and to provide an efficient implementation of the rendering part which is computationally intensive as well potentially determine the view quality. Novel rendering algorithms are added that specifically aim at solving the rendering artifacts, and sampling issues encountered in wide baseline extensions and arbitrary camera movements. To optimize the process, a “selective” warping technique is proposed that takes the advantage of temporal coherence to reduce the computational overhead. Performance is illustrated on challenging videos to prove the suitability and flexibility of the architecture for advanced 3DTV systems.
User assisted disparity remapping for stereo images
2013, Signal Processing: Image Communication
Citation Excerpt :
Object borders are usually difficult to define with the pixel precision. Therefore, image matting methods have been proposed to overcome such image synthesis tasks [43]. In our study, we addressed this problem by smoothing the disparity at the object boundaries.
This study concentrates on user assisted disparity remapping for stereo image footage, i.e. the disparity of an object of interest is altered while leaving the remaining scene unattended. This application is useful in the sense that it provides a method for emphasizing/de-emphasizing an object on the scene by adjusting its depth with respect to the camera. The proposed technique can also be used as a post-processing step for retargeting stereoscopic footage on different display sizes and resolutions. The proposed technique involves an MRF-based energy minimization step for interactive stereo image segmentation, for which user assistance on only one of the stereo pairs is required for determining the location of stereo object pair. A key contribution of the proposed study is elimination of dense disparity estimation step from the pipeline. This step is realized through a sparse feature matching technique between the stereo pairs. Moreover, by the help of the proposed technique, novel disparity adjusted views are synthesized using the produced stereo object segments and background information for the images. Qualitative and quantitative evaluation of the generated segments and the disparity adjusted images prove the functionality and superiority of the proposed technique.

View all citing articles on Scopus

View full text

Boundary matting for view synthesis

Abstract

Introduction

Section snippets

Previous work

Image formation model

Initialization using stereo data

Parameter optimization

Results

Concluding remarks

A taxonomy and evaluation of dense two-frame, stereo correspondence algorithms

Int. J. Comput. Vision

Computing occluding and transparent motions

Int. J. Comput. Vision