1 Introduction

Multimedia technology has undergone an unprecedented evolution in the new century especially in the content display field. Nowadays stereo 3D (S3D) is becoming a mainstream consumer production in both cinema and home with its immersive 3D experience [1]. However, the indispensible glasses serve as an unbridgeable gap restricting its further popularization in daily life. Multiview auto stereoscopic display (MAD) supports the motion parallax viewing in a limited range with the superiority of glasses free [2].

Considering the existing device capabilities it is unrealistic to directly capture, store and transmit the huge data required by multiple views. Based on the novel 3D format multi view plus depth (MVD), C. Fehn proposed depth-image-based rendering (DIBR) as a view synthesis method, which can generate N-views from M-views (M < N) [3]. The accurate final projection location stems from the mapping process from pixel to world to image coordinate system using camera in-extrinsic parameters and depth maps. But the inaccurate estimation of depth map would lower the synthesis quality. What’s more this discrete projection method can easily generate occlusion and disocclusion regions.

To solve the problem exposed by DIBR, Disney Research developed a special view synthesis scheme called Image-Domain-Warping (IDW) [4] based on nonlinear disparity mapping algorithm [5], which conducts mapping process in image space directly. The original image is covered by a regular grid mesh \( G(V,E,Q) \) with vertices \( V \), edges \( E \), quads \( Q \) [6]. It employs a global optimization to obtain a warping function which protects the spatial structure of saliency scenes by deforming homogeneous areas, which is widely used in image retargeting [8]. Because of the continuity mapping there are no holes in IDW as DIBR shows. Nonetheless, Lucas-Kanade (L-K) [11] applied to conducting stereo matching for feature points is a time consuming process. And there is an obvious stretch effect on image border of the synthesis view if the disparity is too large.

Inspired by IDW, our paper proposed a simpler warping scheme which aims to reduce the computational complexity in the energy equation and deal with the stretch effect on the image border. SIFT [12] feature points is only used in stereo matching, where moving least squares is used to utilize every feature points for every grid vertices weighted to make full use of SIFT information. We also introduce a grid-line constraint [8] instead of extracting additional vertical edge-points. Finally, a novel blending algorithm is proposed to solve the stretch effect on the left or right image border. This novel view synthesis method enables to have a suitable visual quality with high efficiency.

2 Proposed Method

To start with our illustration, it is necessary to give a brief explanation of the basic IDW theory. It is worth nothing that they cover the original image with a regular mesh \( G \) to reduce the solution space to grid vertices \( V \) in case of a too large system of linear equations. Differently from DIBR who uses dense depth estimation, IDW takes advantage of the sparse feature points \( P \) extracted from the reference image. These points are used to assign the supposed location in the synthesis image. Saliency map \( S \) is also used to guide the deformation to the non-saliency regions. An energy equation \( E(w) \) is defined that consists of three constraints: data term \( E_{d} \), spatial smooth term \( E_{s} \) and temporal smooth term \( E_{t} \). \( w \) represents the warped function for each grid vertex. Each term is weighted with a parameter:

$$ E(w) = \lambda_{d} E_{d} (w) + \lambda_{s} E_{s} (w) + \lambda_{t} E_{t} (w) $$
(1)

By minimizing the energy function the warp defined at the regular grid vertices is computed out. For the non-grid positions, the warp is obtained by bilinear interpolation algorithm. Finally, a synthesis image is rendered using the calculated warp function.

In this paper, based on IDW described above, we propose an improving view synthesis scheme as Fig. 1 shows. After extracting the feature points and saliency map from the input images, we use moving least squares (MLS) to get the warp position for each grid vertex in order to make full use of the feature points. We also add a grid line constraint to keep the grid line from over-bending. Iterative optimization is applied to obtain the final results. Image blending is dispensable to solve the stretch effect in the image border. Each step is stated in the following sections.

Fig. 1.
figure 1

Overview of the proposed method

2.1 Image Warping

MLS is a classical algorithm which is widely used in image information [13, 14]. A set of control points with its positions after deformation are previously assigned and then the mission is going to find out the exact location for other points of the image.

In our scheme, as left and right views as input images our goal is to obtain the suitable \( w{ - }l \) and \( w{ - }r \) for the two reference images, respectively (hereinafter we use \( w \) as a unified statement). First sparse SIFT point set \( P \) whose outliers excluded by RANSAC [15] is obtained location info indicates the disparity between corresponding point pairs. We use this point set \( P(p_{l} , \, p_{r} ) \) as our control points in MLS algorithm. For a view located in the middle of the two input views, its disparity with the left or right view is:

$$ d = \frac{{p_{r} - p_{l} }}{2} $$
(2)

So the deformation location \( Q(q_{l} , \, q_{r} ) \) can be calculated as:

$$ q_{l} = p_{l} + d $$
(3)
$$ q_{r} = p_{r} - d $$
(4)

It is because our warp \( w \) is defined at grid vertices that we need to propagate the information of the feature points to the mesh vertices to achieve the final warp function. According to MLS, we need to solve for the best \( w \) for every grid vertex which guarantees to warp every \( p \) to \( q \). The \( E_{d} \) constraints can be formulized as:

$$ E_{d} (w) = \sum\limits_{i = 1}^{V} {\sum\limits_{j = 1}^{P} {f_{ij} } } | |w_{i} (p_{j} ) - q_{j} ||^{2} $$
(5)

where \( p_{j} \) and \( q_{j} \) are the original and final feature points. \( f_{ij} \) is the weight factor for every \( p_{j} \) distributing to \( v_{i} \):

$$ f_{ij} = \frac{1}{{\left\| {p_{j} - v_{i} } \right\|^{2} }} $$
(6)

Particularly, we apply the image deformation based MLS algorithm to get the initial warp function instead of putting it to the energy equation with other constraints. This decision not only makes full use of the efficient MLS, but also simplifies the large scale linear equations. The rigorous mathematical deduction is showed in [14].

In [2, 4, 16], besides SIFT points they introduce vertical edge-points which behave image structure well. L-K can estimate these disparities exactly but time-consuming. To protect the image spatial structure without an over time-consuming, we introduce a new grid line energy term to prevent the grid lines from serious bending, since the salient objects may occupy multiple connected quads [8]. It can retain the edge orientations well. Figure 2 gives a comparative result. Figure 2(a) suffer edge deformation serious, especially in the holder of the second ball. Meanwhile Fig. 2(b) keeps the holder from getting a bending deformation by adding the grid line constraint.

Fig. 2.
figure 2

Comparison of whether to add grid line constraint; (a) Without grid line constraint; (b) Grid line constraint added.

We formulate the grid line constraint term as

$$ E_{l} (w) = \sum\limits_{E} {S_{ij} } \left\| {w(v_{i} ) - w(v_{j} ) - l_{ij} (v_{i} - v_{j} )} \right\|^{2} $$
(7)
$$ l_{ij} = \frac{{v_{i}^{{\prime }} - v_{j}^{{\prime }} }}{{v_{i} - v_{j} }} $$
(8)

After the construction of the energy equation, the energy \( E(w) \) is minimized by solving the linear equations iteratively. The iterative process would not finish until the vertex movements compared to the previous one are smaller than 0.5.

2.2 Image Blending

Due to the different fields of the adjacent camera positions, stretch effect can occur on the right or left border if only left or right image is used for warping. This paper introduces a novel blending method which can employ the border information in the one synthesis image to cover the stretch region in the other synthesis image. The basic idea of the blending algorithm is illustrated in Fig. 3.

Fig. 3.
figure 3

Image blending process

First two blocks are cut in the matched and matching images respective based on the relevant disparity relations. And a ceil \( C \) with the maximal gradient value is found out in block \( B \). As a template \( C \) is slid in \( B^{{\prime }} \), we need to find a region \( C^{{\prime }} \) with the maximal correction value with \( C \). The correlation level \( Corrs \) is calculated as:

$$ Corrs = \frac{{\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {C(i,j) \times C'\left( {i,j} \right)} } }}{{\left( {\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {C(i,j)^{2} } } \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {C'(i,j)^{2} } } } \right)^{1/2} }} $$
(9)

After finding the best matching block, we can stitch the block to the original synthesis results according to the location relations between the two blocks and the images.

3 Experimental Results

In the experiment, we show two video sequences, one with the resolution of 1024 × 768, and the other 1280 × 960. The relevant mesh resolutions are 30 × 40 and 40 × 50, which are much smaller than IDW’s 180 × 100. In the blending process, we assign the cell’s size as 10 × 10. We accomplish the scheme according to our comprehension since we couldn’t find the IDW’s implementation. To evaluate the performance we use the traditional DIBR method as our comparison.

The synthesized effects are shown in Figs. 4 and 5. From the figures it can be seen that the traditional DIBR would suffer blurring effect because of the overlay of the left and right synthesis views. However, our results have achieved better visual quality relying on the optimized energy equation, especially in some parts denoted by red rectangles.

Fig. 4.
figure 4

The Balloons results. The first three are from the DIBR method: (a) Left mapping results; (b) Right mapping results; (c) Overlay (a) with (b) in pixel level; The following two are from our proposed method: (d) Left warping result; (e) Right warping result; (f) Ground truth; (g) Partial enlarged region of (c); (h) Partial enlarged region of (d); (i) Partial enlarged region of (e). (Color figure online)

Fig. 5.
figure 5

The champagne tower results. The first three are from the DIBR method: (a) Left mapping results; (b) Right mapping results; (c) Overlay (a) with (b) in pixel level; The following two are from our proposed method: (d) Left warping result; (e) Right warping result; (f) Ground truth; (g) Partial enlarged region of (c); (h) Partial enlarged region of (d); (i) Partial enlarged region of (e). (Color figure online)

4 Conclusion

In this paper, we have proposed a simple but efficient method to synthesis middle view from S3D inputs based on IDW. The first contribution is to obtain the initial warping position by image deformation based on MLS theory. As the second one, we introduce a grid line energy term to our energy equation. Finally, we apply a novel image blending algorithm to solve the border stretch deformation. Our experimental results demonstrate that our proposed method can generate a synthesis image meeting human visual comfort. In the future, we could make a further research on IDW and would introduce depth map to our method for depth map has a strong representation for the spatial structure of the image.