Elsevier

Pattern Recognition

Volume 118, October 2021, 108019
Pattern Recognition

Edge-guided Composition Network for Image Stitching

https://doi.org/10.1016/j.patcog.2021.108019Get rights and content

Highlights

  • We propose a novel deep learning framework for composition in image stitching.

  • We illustrate the importance of structure consistency preserving in image composition, and leverage the structure prior provided by a proposed perceptual edge branch to further enhance the composition performance.

  • We build a Real Image Stitching Dataset (RISD), which is the first general-purpose benchmark for image stitching training.

Abstract

Panorama creation is still challenging in consumer-level photography because of varying conditions of image capturing. A long-standing problem is the presence of artifacts caused by structure inconsistent image transitions. Since it is difficult to achieve perfect alignment in unconstrained shooting environment especially with parallax and object movements, image composition becomes a crucial step to produce artifact-free stitching results. Current energy-based seam-cutting image composition approaches are limited by the hand-crafted features, which are not discriminative and adaptive enough to robustly create structure consistent image transitions. In this paper, we present the first end-to-end deep learning framework named Edge Guided Composition Network (EGCNet) for the composition stage in image stitching. We cast the whole composition stage as an image blending problem, and aims to regress the blending weights to seamlessly produce the stitched image. To better preserve the structure consistency, we exploit perceptual edges to guide the network with additional geometric prior. Specifically, we introduce a perceptual edge branch to integrate edge features into the model and propose two edge-aware losses for edge guidance. Meanwhile, we gathered a general-purpose dataset for image stitching training and evaluation (namely, RISD). Extensive experiments demonstrate that our EGCNet produces plausible results with less running time, and outperforms traditional methods especially under the circumstances of parallax and object motions.

Introduction

Image stitching plays an important role in many applications like remote image mosaicking [1], panoramic videos [2], virtual reality [3], and panorama creation software. Conventionally, image stitching is a process of combining multiple images with overlapped areas to produce a wide-view panorama [4]. There are two ways of mitigating the artifacts on the stitched image: (1)improving alignment by increasing the available degrees of freedom [5], [6], [7], [8], [9]; (2) hiding misalignments by image composition techniques such as selecting better seams in the overlapped areas. However, artifacts caused by large parallax and objects moving can hardly be concealed by better alignment, thus improving the image composition is the only remedy in these cases.

The most widely used composition technique for image stitching is seam-cutting [10], [11], [12], [13] and blending [14], [15]. The former aims at finding an invisible seam in the overlapped region of the aligned images and the latter is to smooth the transitions between images near the seam. Seam-cutting can be viewed as a {0,1} labeling problem that creates a mapping between pixels in the composite and source images. The optimal seam is found by minimizing an energy function that describes the differences between images in the overlapped region. Many efforts have been devoted to improving seam-cutting performance by penalizing the photometric difference using various energy functions. The euclidean-metric color difference and gradient difference [12], [16], [17] are the earliest attempt. Later, other features like texture complexity [18], canny edges [19], [20], and saliency [21] were also taken into consideration. However, the performance of these methods is limited by the weak hand-crafted features. Because of the difficulty to find the best combination and balance the weight of these features, traditional methods often lack of adaptively to various input conditions. Moreover, these hand-crafted features only gather very local information and lack of stronger geometric information guidance, thus may produce structure inconsistency results, such as tearing or cropping recognizable objects (Figure 1). In terms of computation time, the energy-based composition methods all have high time complexity by solving with min-cut or dynamic programming, which is far from being a real-time application. Therefore, it is highly desirable to have a more efficient image composition method that is robust to varying input conditions and capable of utilizing features adaptively.

Recently, convolutional neural networks (CNNs) has greatly promoted the development of computer vision. With the help of CNNs, more discriminative and adaptive image features can be extracted, and more robust performance could be reached with the data-driven method. To the best of our knowledge, the only learning-based work related in image/video stitching is by Lai et al. [22], which learns a smooth spatial interpolation between input videos. Different from theirs, we focus on the image composition process and cast the whole process as a blending problem. To seamlessly blend the pre-registered images into an artifacts-free stitched image, we leverage CNNs to develop an Edge-guided Composition Network (EGCNet) to regress the blending weights. The insight is that the pre-registered images should be blended at regions with high similarity of them to avoid inconsistent and noticeable image transitions. To learn this composition nature better, we use Siamese network [23] for features extraction, and combine the feature maps not only through concatenation but also a group-wise correlation layer to explicitly provide similarity measurement, so the following layers can predict blending weights from the combined features. Our work can be seen as continuing the second line of image stitching research.

Training such a network to predict generic composition for image stitching requires a sufficiently large training set. However, the existing image stitching datasets are all too small for effective training. Thus, to provide sufficient training data, we build a Real Image Stitching Dataset (RISD) of 500 pairs of images with various parallax levels and scene types. Each sample of RISD contains a pre-registered image pair and a visually plausible composite image as ground-truth. With the help of RISD, our EGCNet can be trained in an end-to-end manner. Meanwhile, since preserving structure consistency is of great significance in the composition process, we explicitly exploit the perceptual edges as guidance. To achieve this, we build a two-branch network with a composition branch and a perceptual edge branch. The perceptual edge branch detects edges of the input pre-registered images and guides the composition branch in two aspects. On the one hand, edge features generated in the perceptual edge branch are embedded into the composite branch to provide structure prior. On the other hand, edge maps produced by the perceptual edge branch are utilized in our proposed edge-aware smoothness loss and detail-preserved loss, which guide the model to pay more attention to regions with strong structural information. The whole model is trained in a joint way of supervision and self-supervision, which is supervised by the ground-truth composite images in RISD and self-supervised by the edge-aware losses that were designed based on the task characteristics.

In summary, our main contributions are as following:

  • 1.

    We propose a novel end-to-end deep learning framework for composition in image stitching.

  • 2.

    We illustrate the importance of structure consistency preserving in image composition, and leverage the structure prior provided by a proposed perceptual edge branch to further enhance the composition performance.

  • 3.

    We build a Real Image Stitching Dataset (RISD), which is the first general-purpose benchmark for image stitching training.

Section snippets

Traditional Composition Methods for Image Stitching

After warping the images to align with each other, the simplest way to reduce unnatural transitions between images is smoothing the overlapped region such as feathering and center-weighting [4], [16], [24]. These methods take a weighted average value at each pixel, but these methods do not work well since illumination differences, misalignments, and blurring are still often visible.

Better image composition algorithms have been proposed over the years, but the main idea remains the same. The

Method

In this section, we first introduce the overall framework. Then we present the details of the architecture and the loss functions. Finally, the construction method of the training dataset RISD is presented.

Implementation Details

In the first training stage, following [37], [38], we mix augmentation data of BSDS500 [39] with flipped PASCAL VOC Context dataset [40] to pre-train the perceptual edge branch. Stochastic gradient descent (SGD) minibatch samples 10 images randomly in each iteration. The initial learning rate is set to 104 and divided by 10 after every 10k iterations. Momentum and weight decay are set to be 0.9 and 2×104, respectively. For data augmentation, we adopt the same strategy with RCF [37].

In the

Limitations and future work

The proposed EGCNet performs well in a wide range of input scenarios with parallax or object movements. However, it still has some limitations: (1) Our method dose not learn the overall luminance adjustment between the input images, which may cause noticeable brightness transitions under circumstances of large exposure difference(The first row of Figure 16) even though well preserves the structure consistency. (2) When the parallax is too large, or the input pre-registered images barely have

Conclusion

In this paper, we proposed an efficient EGCNet to perform the composition stage for image stitching in an end-to-end learning-based manner. EGCNet takes two unordered pre-registered images as input and learns to predict the blending weights to composite them into an artifact-free panorama. To preserve the structure consistency better, we built a perceptual edge branch to explicitly provide edge guidance and further improve the performance. The edge guidance incorporates in two aspects, one is

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowlgedgments

This work was supported by the Key Project of the National Natural Science Foundation of China under Grant 61731009, the NSFC-RGC under Grant 61961160734, the National Natural Science Foundation of China under Grant 61871185, and the Science Foundation of Shanghai under Grant 20ZR1416200.

Qinyan Dai received the B.Sc. Degree in Computer Science and Technology from East China Normal University, Shanghai, China, in 2019. She is pursuing the masters degree at the same university. Her research interests include image stitching and image restoration.

References (44)

  • M. Xia et al.

    Globally consistent alignment for planar mosaicking via topology analysis

    Pattern Recognition

    (2017)
  • T. Xiang et al.

    Image stitching by line-guided local warping with global similarity constraint

    Pattern Recognition

    (2018)
  • Y. Chen et al.

    Semeda: Enhancing segmentation precision with semantic edge aware loss

    Pattern Recognition

    (2020)
  • V.R. Gaddam et al.

    Tiling in interactive panoramic video: Approaches and evaluation

    IEEE Transactions on Multimedia

    (2016)
  • Q. Zhao, L. Wan, W. Feng, J. Zhang, Cube2video: Navigate between cubic panoramas in real-time, IEEE Transactions on...
  • R. Szeliski

    Image alignment and stitching: A tutorial

    Foundations & Trends®in Computer Graphics & Vision

    (2006)
  • J. Zaragoza et al.

    As-projective-as-possible image stitching with moving dlt

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2013)
  • C.-C. Lin et al.

    Adaptive as-natural-as-possible image stitching

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • Y.-S. Chen et al.

    Natural image stitching with the global similarity prior

    ECCV

    (2016)
  • K.-Y. Lee et al.

    Warping residual based image stitching for large parallax

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    (2020)
  • A. Eden et al.

    Seamless image stitching of scenes with large motions and exposure differences

    2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

    (2006)
  • J. Jia et al.

    Image stitching using structure deformation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2008)
  • V. Kwatra et al.

    Graphcut textures: image and video synthesis using graph cuts

    ACM Trans. Graph.

    (2003)
  • J. Gao et al.

    Seam-driven image stitching.

    Eurographics (Short Papers)

    (2013)
  • P.J. Burt et al.

    A multiresolution spline with application to image mosaics

    ACM Trans. Graph.

    (1983)
  • P. Pérez et al.

    Poisson image editing

    ACM Trans. Graph.

    (2003)
  • A. Agarwala et al.

    Interactive digital photomontage

    SIGGRAPH 2004

    (2004)
  • J. Gao et al.

    Seam-driven image stitching

    Eurographics

    (2013)
  • L. Li et al.

    Edge-enhanced optimal seamline detection for orthoimage mosaicking

    IEEE Geoscience and Remote Sensing Letters

    (2018)
  • F. Zhang et al.

    Parallax-tolerant image stitching

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • K. Lin et al.

    Seagull: Seam-guided local alignment for parallax-tolerant image stitching

    European Conference on Computer Vision

    (2016)
  • N. Li et al.

    Perception-based seam cutting for image stitching

    Signal, Image and Video Processing

    (2018)
  • Cited by (38)

    • Bio-inspired multi-level interactive contour detection network

      2023, Digital Signal Processing: A Review Journal
    • BLEDNet: Bio-inspired lightweight neural network for edge detection

      2023, Engineering Applications of Artificial Intelligence
    • Learning to simultaneously enhance field of view and dynamic range for light field imaging

      2023, Information Fusion
      Citation Excerpt :

      Subsequently, Nie et al. [33] also proposed an efficient registration loss to enable unsupervised homography learning, and designed a multi-resolution module to reconstruct perceptually nature results. Dai et al. [34] proposed an edge-guided composition network to generate stitching results with consistent structure. The above methods can be directly employed to stitch LF images, for example, to generate the stitched LF image in a per-view stitching manner.

    View all citing articles on Scopus

    Qinyan Dai received the B.Sc. Degree in Computer Science and Technology from East China Normal University, Shanghai, China, in 2019. She is pursuing the masters degree at the same university. Her research interests include image stitching and image restoration.

    Faming Fang received the Ph.D. degree in computer science from East China Normal University, Shanghai, China, in 2013. He is currently a Professor with the School of Computer Science and Technology, East China Normal University. His main research area is image processing using the variational methods, PDEs and deep learning.

    Juncheng Li received the B.Sc. degree in Computer Science from Jiangxi Normal University, Nanchang, China, in 2016. He is currently a Ph.D. student with the School of Computer Science and Technology, East China Normal University, Shanghai, China. His main research interests are image processing, machine learning and deep learning.

    Guixu Zhang received the Ph.D. degree from the Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, China, in 1998. He is currently a Professor with the Department of Computer Sci ence and Technology, East China Normal University, Shanghai, China. His research interests include hyperspectral remote sensing and image processing.

    Aimin Zhou received the B.Sc. and M.Sc. degrees in computer science from Wuhan University, Wuhan, China, in 2001 and 2003, respectively, and the Ph.D. degree in computer science from the University of Essex, Colchester, U.K., in 2009. He is currently a Professor with the School of Computer Science and Technology, East China Normal University, Shanghai, China. He has authored over 70 peer-reviewed papers. His current research interests include machine learning, image processing, and their applications.

    Fully documented templates are available in the elsarticle package on CTAN.

    View full text