Keywords

1 Introduction

Image stitching has been extensively studied recently and applied in many fields, such as scene understanding [31], virtue reality [12], photogrammetry and remote sensing [10]. However, they often perform under the assumptions [22] that the imaging scene is approximately planar, or that images are taken under simple camera rotations. Obviously, these conditions are not always conformed with the real case, especially for photos taking by smart phones or cameras, as demonstrated in Fig. 1. The main challenges are:

  • Global warps [2] or even local warps [28] are difficult to handle the complex scene with different dominant planes. The former adopts only one transformation, which lacks the flexibility for complex scenes. The latter often ignores the different planes in the scene and causes large alignment errors.

  • The existing methods cannot work well on images with large parallax, caused by random shooting positions and viewing angles [30]. Thus, they will inevitably bring noticeable artifacts or objectionable distortions.

Fig. 1.
figure 1

An illustration of our method. The top row display two input images taken casually and the scene contains multiple distinct planes. The bottom row shows the detected planes and transition regions (left) and the final stitching result (right). Planar regions are warped by planar homography \(\mathbf {H}_i\), and the transition regions are transformed by the local weighted homography \(\sum _{i=1}^{5}\alpha _i \mathbf {H}_i\).

Many approaches have been proposed to solve these problems. The main solution is spatially varying warps, e.g. multiple local warps [7] or the global warp with mesh optimization [29, 30], which provide flexible warps to handle images with moderate parallax. However, these methods greatly depend on the number and distribution of point correspondences. In addition, distortions, resulted by non-linear transformations [9], are commonly obvious, e.g. projective and structure deformations. Many methods are developed to mitigate distortions, such as constraint of similarity transformation [4, 5, 14, 25], or geometric structure cues [24, 26, 30], however, the reduction is limited under the scene with rich contents and structures. Besides, large parallax is a challenging task for these methods [15].

Another solution is seam-assist image stitching, which holds the advantages of dealing with large parallax. The common way is to perform the seam cutting after image alignment to hide the inevitable ghosting or artifacts [7, 30]. The seam line is often selected by the color or gradient difference, image edges, etc., while they little consider the influence of alignment [15, 30]. The seam cutting can be also closely integrated with alignment for interaction [8, 15, 29]. The main idea of these methods is that images are aligned well only in local area, where the seam line across. Seam quality assessment are proposed to guide the selection of homography estimated from a set of point correspondences. In fact, they rely on the selection of local homography/correspondence set. In some complex scene, the optimal selection is difficult to find if the local alignment region contains multiple planes, due to these methods only take one homography to tackle the whole scene.

To the best of our knowledge, few works consider how to deal with the scene with strong structural regularities, in the form of multiple distinct planes. Because one global or local homography cannot fit for the complex scene, dual-homography warping [7] clustered the match points into two groups to estimate the dual homographies for the scene containing two predominate planes: a distant plane and a ground plane. However, the difference between the rough plane partition and the true plane scene may cause misalignment and structure deformations. It may degrade the performance in the complex scene with more than two distinct planes.

Therefore, this paper proposes a smoothly planar homography model for image stitching. To obtain the plane warps, we propose to automatically detect plane points and segment the scene into piecewise planar regions. Then adaptive plane-based warps are estimated and integrated to perform local alignment. Once the images are geometrically aligned, a misalignment-guided seam is calculated to perform seamless stitching. This model can handle more than two distinct planes with large parallax. Figure 1 gives an example of the proposed method. Thus, the contribution of this paper is twofold:

  • We propose a multi-plane homography estimation and integration strategy to handle the complex scene with multiple dominant planes and achieve plausible stitching.

  • We propose a novel seam estimation method guided by alignment error to deal with parallax, which provides seamless image stitching.

2 Related Works

Numerous works have been devoted to image stitching. A exhaustive review was proposed in [22]. Here, we give a briefly survey of related works.

Global Parametric Models. Early methods adopt global parametric warps (e.g. affine, projective warps) to align images. The performance is degraded when images are taken with different viewpoints or scenes are not roughly planar. To remedy deficiency of single warp, Gao et al. [7] proposed a dual-homography warp to stitch images. However, it only fits for simple scene with two planes, ground and distant planes.

Spatially Varying Warps. Spatially varying warps are proposed to handle complex scene. Followed by composition techniques, these methods work well for images with moderate parallax. They can be roughly classified into two categories: local warps and mesh optimization-based warps. The former estimates multiple local transformations to align images locally, such as smoothly varying affine warps [17], shape-preserving half-projective (SPHP) [4], as-projective-as-possible (APAP) warps [28] and its variants [5, 14, 18]. The latter applies mesh optimization model with a series of feature constraints after general warps, such as feature alignment [23, 27, 30] and photometric alignment [16]. These methods cannot consider the particularity of multi-plane scenes [19], that is the difference of transformation of different plane regions, thus they may fail to produce satisfactory stitching results.

Seam-Assist Stitching Methods. To stitch images with large parallax, some seam-assist methods are proposed. Unlike the method that performs seam cutting after image alignment [13], Gao et al. [8] proposed a seam-driven image stitching method. The method evaluates the seam-cut quality to guide the selection of optimal transformation. Based on it, parallax-tolerant stitching model [29] and seam-guided local alignment model [15] are proposed to improve the stitching performance. However, these methods may only align one local regions at a time, and the applied seam may accidentally pass through the other regions with large misalignments.

Fig. 2.
figure 2

The workflow of our smoothly planar stitching algorithm.

3 Smoothly Planar Stitching

The proposed stitching algorithm is illustrated in Fig. 2. The planar regions are estimated based on the detection of planar points, then the multiple planar homography are integrated by the designed weight strategy for smoothly stitching. To handle parallax, alignment errors are used to guide the seamline estimation for seamless composition.

3.1 Planar Region Estimation

For real scenes with multiple planes, we use a robust multi-structure geometric fitting method, called random cluster models sampler (RCMSA) [20], to detect planes from the point correspondences. RCMSA adopts random cluster models to perform hypothesis generation using subsets larger than minimal. Compared with random hypothesis generation, RCMSA provides good hypotheses, which are less affected by the vagaries of fitting on minimal subsets.

For two views of multiple-plane scene, given N point matches \(P=\{{p}_i\}_{i=1}^{N}\) across two images, where each \({p}_i=(\mathbf {x}_i, \mathbf {x}_i^{'})\) denotes a pair of match points in homogeneous coordinates. The RCMSA is to partition the match points into different planes (structures) as well as to remove the false matches. The number of structures is unknown and must also be estimated.

Basically, RCMSA works in the following way. Random cluster models is first used as hypothesis sampler to generate clusters for hypotheses \(\varTheta = \{\theta _c\}_{c=1}^K\). Next, an annealing method based on graph cuts is employed to optimize the fitting of structures. The graph \( \mathcal {G}=(\mathcal {V}, \mathcal {N})\) is builded on the match points, where each vertice \(\mathcal {V}=P\), and the edge \(\mathcal {N}\) is constructed from the Delaunay triangulation of P. The goal is to assign each pair of match points \({p}_i\) to one of the structures in \(\varTheta \), denoted by labels \(L = \{l_i\}_{i=1}^N\). That is, \(l_i = k, k =\{1,2,...,K\}\) if \({p}_i\) belongs to the \(k \textendash th\) structures, otherwise \(l_i = 0\) if \({p}_i\) is an outlier. The energy function is defined as

$$\begin{aligned} E(\varTheta , l) = \sum \limits _{i = 1}^N {D( {{{p}_i},{l_i}} )} + \sum \limits _{\left\langle {i,j} \right\rangle \in \mathcal {N}} {V( {{l_i},{l_j}} )}, \end{aligned}$$
(1)

where \(D({p}_i, l_i)\) is the data cost and constructed as

$$\begin{aligned} D( {{p}_i},{l_i} ) = \left\{ \begin{array}{cl} r({{p}_i},{\theta _{l_i}})^2, &{} \text {if } {l_i} \in \{ {1,2,...,k} \}\\ {\eta }, &{} \text {if } {l_i} = 0 \end{array} ,\right. \end{aligned}$$
(2)

where \(r({p}_i,{\theta _{l_i}})\) is the absolute residual of \({p}_i\) to structure \(\theta _{l_i}\), and \(\eta \) is the penalty if \({p}_i\) is an outlier. The smoothness cost V is defined as

$$\begin{aligned} V( {l_i},{l_j} ) = \left\{ \begin{array}{cl} 0, &{} \text {if } l_i = l_j \\ 1, &{} \text {if } l_i \ne l_j \end{array} ,\right. \end{aligned}$$
(3)

The solution of \(L = \{l_i\}\) can be obtained based on \(\alpha \textendash \text {expansion}\) [1].

In our implementation, RCMSA is iteratively adopted on outliers, until outliers are small enough or the new detected plane points are small. To refine the detection of plane points, the projective distance is employed to adjust the plane labels of points. If the projective distance of one point by \(k \textendash th\) planar homography \(\mathbf {H}_k\) is less than \(\delta \), the point is reassigned to this plane label \(l_i = k\), where \(\mathbf {H}_k\) is estimated by the correspondences in plane \(\theta _k\). Thus, the points are labeled to each plane.

One simple way is to warp each plane by its corresponding transformation, however, there may be gaps between the plane regions, or plane regions may overlap. In our idea, the images are partitioned into two regions: plane and transition regions. For plane regions, we adopt the homography estimated by the point correspondences belong to current plane. For transition regions, they are transformed by the local weighted homography, detailed below, so that to keep the continuity along the boundary of neighboring plane regions. Here, the neighborhood of each plane points, e.g. less than \(\varepsilon \), is regarded as the plane regions, and the rest is transition region. Figure 3 shows the detection of plane points by applying RCMSA and the partition of plane regions.

Fig. 3.
figure 3

Plane region estimation. (a) Detection of multiple planar points based on RCMSA; (b) Estimation of planar and transition regions. Planar regions are highlighted red. (Color figure online)

3.2 Smoothly Planar Homography

For transition regions, the local weighted homography is employed to maintain the continuity and smoothness between neighboring plane regions. Given a pixel p in transition regions, the warps is estimated as

$$\begin{aligned} \mathbf {H}_p = \sum _{i=1}^{K} { \alpha _i \mathbf {H}}_i, \end{aligned}$$
(4)

where \(\mathbf {H}_i\) represent the each plane homography, K is the number of plane regions, and \(\alpha _i\) denotes weight that adjusts the contribution of each plane homography. The weight is computed based on spatial proximity with Gaussian kernel,

$$\begin{aligned} \alpha _i = \text {exp}(- d_i / \sigma ^2), \end{aligned}$$
(5)

where \(d_i\) denotes the distance to the closest pixel in \(i \textendash th\) planar regions, and \(\sigma \) is set to 4−8. To mitigate the projective distortions, the global similarity constraint proposed in [24] is employed by integration with local homography. The procedure of smoothly planar homography is given in Algorithm 1.

figure a

3.3 Alignment-Guided Seamless Composition

After alignment, seam cutting plays an important role in seamless stitching mosaic, especially for large parallax cases. To search for optimal seam line between two images, the difference of image color, gradient and edge map [15, 29] in the overlapping region are often adopted to construct smoothness terms in graph cut seam algorithm [11].

In fact, alignment error has a great influence on the seam finding [30]. The large misalignment pixels with similar colors will confusion seam cutting and produce bad seams. A plausible seam should traverse low-texture and inconspicuous regions, and avoid passing pixels with large alignment errors or distinct structures such as edges. Therefore, we propose to integrate alignment error and edge difference to generate good seams.

For match point, the alignment error is calculated as

$$\begin{aligned} e_x = { \Vert \mathbf {x}_i - \mathbf {H} \mathbf {x}_i^{'} \Vert }, \end{aligned}$$
(6)

where \((\mathbf {x}_i, \mathbf {x}_i^{'})\) is a pair of match points. \(\mathbf {H}\) is the corresponding plane homography.

According to point alignment error, we can generate a per-pixel error map by interpolation,

$$\begin{aligned} \begin{array}{cl} e_p = \sum \nolimits _x w_{p,x} e_x / {\sum \nolimits _x w_{p,x}} \\[2mm] w_{p,x} = exp( - { \Vert p - x \Vert } / \rho ^2 ), \end{array} \end{aligned}$$
(7)

where \(w_{p,x}\) is the weight factor calculated by the distance of the pixel p in overlapping region to match point x. \(\rho \) is scale parameter and set to 8. The interpolation is conducted by the M match points closet to the pixel p. To reduce the influence of large alignment errors, we define the alignment term as

$$\begin{aligned} E_a = 1 - exp( - e_p^2 / (\tau )^2), \end{aligned}$$
(8)

where \(\tau \) is set to 0.003D, where D denotes the length of image diagonal. The smoothness cost function is

$$\begin{aligned} E_{i,j}(p) = E_a (E_c + E_e), \end{aligned}$$
(9)

where \(E_c\) is the color difference, \(E_e\) denotes the image edge probability difference computed by structured edge detector [6]. The smoothness cost is combined into graph cut seam finding algorithm [11] to search for a good seam. Then multi-band blending [3] is applied.

4 Experiments

To verify the effectiveness of the proposed method, we test our algorithm on a series of challenging data and compared with other stitching methods. The parameters of the compared methods are set as recommendation in the respective papers. Given a pair of images, the keypoints are detected and matched by deep matching algorithm [21] in our implementation.

Fig. 4.
figure 4

Warp comparison. The image are warped by (a) global homography [2], (b) APAP [28], and (c) the proposed warping. Then, the alignment-guided seam composition is applied on these three results. For comparison, we highlight some details in the blue and green boxes. Errors are shown in red circles. (Color figure online)

4.1 Warping Performance

Figure 4 compares the warp performance with other two common warp model, that is, global warping and local warping. Here, global homography [2] and APAP warps [28] are selected for comparison. After warping, the proposed alignment-guided seam composition is employed on these stitching results for seamless composition. Figure 4(a) shows the result by global method, which applies a global homography to warp images. On one hand, scene with multiple distinct planes cannot be represented by only one transformation, result in severe misalignments. On the other hand, even though seam-cutting is applied, the seam-cutting cannot find well-aligned regions across in some areas. Thus, the seam passes misalignment regions and produces broken structures. APAP adopts multiple local homographies to align as many point matches as possible, and improves the stitching performance, e.g. green region in Fig. 4(b). However, due to the adverse influence of point matches in different planes (blue region) or uneven and insufficient points (green region), it is hard to provide accurate warping model for well alignment, result in stitching errors. Our smoothly planar homography adopts two different warping model to align planar and transition regions, which provides satisfactory alignment locally. Together with our novel alignment-guided seam composition, the estimated seam finds locally well-aligned regions, which can avoid regions with large parallax.

Fig. 5.
figure 5

Seam composition. (a) Enblend, (b) The proposed seam composition without alignment guidance, (c) The proposed seam composition. For comparison, we highlight some details in the blue and green boxes. Seam errors are shown in red circles. (Color figure online)

4.2 Composition Performance

Figure 5 shows the seam composition performance of EnblendFootnote 1, our method without guidance of alignment, and our method with guidance of alignment. From the enlarged views, Enblend produces severe seam errors, e.g. the disappeared buoy and the distortions of construction. In fact, Enblend only considers the color difference and gradient difference, which may suffer from ghosts or errors. By adding edge or boundary constraints, our method without alignment guidance provides a relatively better result, but the seam error is still obvious, mainly because of the large mis-match on red concrete columns. With the constraint of alignment error, the proposed seam composition avoids the regions with big alignment errors and provide satisfactory seam composition.

Fig. 6.
figure 6

Comparison with spatially varying methods, i.e. (a) ICE, (b) APAP [28], (c) SPHP [4] and (d) the proposed method. For comparison, we highlight some details in the green boxes. Errors are shown in red circles. (Color figure online)

Fig. 7.
figure 7

Comparison with seam-assist stitching methods, i.e. (a) ICE, (b) parallax-tolerant stitching [29] and (c) the proposed method. For comparison, we highlight some details in the green boxes. Errors are highlighted in red circles. (Color figure online)

4.3 Comparison with Other Methods

Figure 6 gives the comparison with some spatially varying methods, including image composition editor (ICEFootnote 2), APAP [28], SPHP [4] and ours method. Some details are provided in enlarged views for comparison. Although ICE takes global transformation, it provides good stitching result because of the advanced image composition. However, the alignment errors remain obvious shown in red circle. APAP adopts local homographies to align as many correspondences as possible in the overlapping region. Due to rich correspondences, it provides satisfactory alignment performance, but it suffers from local distortions (shown in red circle) caused by feature matches in multiple planes. SPHP produces obvious stitching errors, because the applied warps cannot well represent the multi-plane image transformation. The estimated seam may accidentally pass through regions with misalignments and thus generate broken structures. Our smoothly planar homography method uses different model to process planar regions and transition regions and thus aligns different planar regions well. Together with the alignment-guided seam composition, which finds local well-aligned regions for composition, our method provides visually appealing stitching results.

Figure 7 provides the comparison with seam-assist stitching methods, including ICE and parallax-tolerant stitching [29] method. In ICE results, the seam cutting does not consider alignment errors and thus causes obvious broken structures. In parallax-tolerant stitching, the best homography is choosed for good local alignment. However, the applied seam may still be stumbled by large misalignment. In comparison, the proposed method provides satisfactory stitching results.

5 Conclusion

In this paper, we present a smoothly planar homography model for stitching images with multiple planes and large parallax. The plane and transition regions are detected based on the multiple plane correspondences, and warped with respective transformations. The multiple plane homographies are integrated to perform the smoothly stitching on transition regions. In addition, the alignment-guided seam composition is adopted to perform seamless stitching. Experiments prove the effectiveness and robustness of the proposed method and confirm the state-of-the-art stitching performance. In the future, the advanced plane detection methods may be beneficial for accurate detection of plane regions.