Edge-guided Composition Network for Image Stitching☆
Introduction
Image stitching plays an important role in many applications like remote image mosaicking [1], panoramic videos [2], virtual reality [3], and panorama creation software. Conventionally, image stitching is a process of combining multiple images with overlapped areas to produce a wide-view panorama [4]. There are two ways of mitigating the artifacts on the stitched image: (1)improving alignment by increasing the available degrees of freedom [5], [6], [7], [8], [9]; (2) hiding misalignments by image composition techniques such as selecting better seams in the overlapped areas. However, artifacts caused by large parallax and objects moving can hardly be concealed by better alignment, thus improving the image composition is the only remedy in these cases.
The most widely used composition technique for image stitching is seam-cutting [10], [11], [12], [13] and blending [14], [15]. The former aims at finding an invisible seam in the overlapped region of the aligned images and the latter is to smooth the transitions between images near the seam. Seam-cutting can be viewed as a labeling problem that creates a mapping between pixels in the composite and source images. The optimal seam is found by minimizing an energy function that describes the differences between images in the overlapped region. Many efforts have been devoted to improving seam-cutting performance by penalizing the photometric difference using various energy functions. The euclidean-metric color difference and gradient difference [12], [16], [17] are the earliest attempt. Later, other features like texture complexity [18], canny edges [19], [20], and saliency [21] were also taken into consideration. However, the performance of these methods is limited by the weak hand-crafted features. Because of the difficulty to find the best combination and balance the weight of these features, traditional methods often lack of adaptively to various input conditions. Moreover, these hand-crafted features only gather very local information and lack of stronger geometric information guidance, thus may produce structure inconsistency results, such as tearing or cropping recognizable objects (Figure 1). In terms of computation time, the energy-based composition methods all have high time complexity by solving with min-cut or dynamic programming, which is far from being a real-time application. Therefore, it is highly desirable to have a more efficient image composition method that is robust to varying input conditions and capable of utilizing features adaptively.
Recently, convolutional neural networks (CNNs) has greatly promoted the development of computer vision. With the help of CNNs, more discriminative and adaptive image features can be extracted, and more robust performance could be reached with the data-driven method. To the best of our knowledge, the only learning-based work related in image/video stitching is by Lai et al. [22], which learns a smooth spatial interpolation between input videos. Different from theirs, we focus on the image composition process and cast the whole process as a blending problem. To seamlessly blend the pre-registered images into an artifacts-free stitched image, we leverage CNNs to develop an Edge-guided Composition Network (EGCNet) to regress the blending weights. The insight is that the pre-registered images should be blended at regions with high similarity of them to avoid inconsistent and noticeable image transitions. To learn this composition nature better, we use Siamese network [23] for features extraction, and combine the feature maps not only through concatenation but also a group-wise correlation layer to explicitly provide similarity measurement, so the following layers can predict blending weights from the combined features. Our work can be seen as continuing the second line of image stitching research.
Training such a network to predict generic composition for image stitching requires a sufficiently large training set. However, the existing image stitching datasets are all too small for effective training. Thus, to provide sufficient training data, we build a Real Image Stitching Dataset (RISD) of 500 pairs of images with various parallax levels and scene types. Each sample of RISD contains a pre-registered image pair and a visually plausible composite image as ground-truth. With the help of RISD, our EGCNet can be trained in an end-to-end manner. Meanwhile, since preserving structure consistency is of great significance in the composition process, we explicitly exploit the perceptual edges as guidance. To achieve this, we build a two-branch network with a composition branch and a perceptual edge branch. The perceptual edge branch detects edges of the input pre-registered images and guides the composition branch in two aspects. On the one hand, edge features generated in the perceptual edge branch are embedded into the composite branch to provide structure prior. On the other hand, edge maps produced by the perceptual edge branch are utilized in our proposed edge-aware smoothness loss and detail-preserved loss, which guide the model to pay more attention to regions with strong structural information. The whole model is trained in a joint way of supervision and self-supervision, which is supervised by the ground-truth composite images in RISD and self-supervised by the edge-aware losses that were designed based on the task characteristics.
In summary, our main contributions are as following:
- 1.
We propose a novel end-to-end deep learning framework for composition in image stitching.
- 2.
We illustrate the importance of structure consistency preserving in image composition, and leverage the structure prior provided by a proposed perceptual edge branch to further enhance the composition performance.
- 3.
We build a Real Image Stitching Dataset (RISD), which is the first general-purpose benchmark for image stitching training.
Section snippets
Traditional Composition Methods for Image Stitching
After warping the images to align with each other, the simplest way to reduce unnatural transitions between images is smoothing the overlapped region such as feathering and center-weighting [4], [16], [24]. These methods take a weighted average value at each pixel, but these methods do not work well since illumination differences, misalignments, and blurring are still often visible.
Better image composition algorithms have been proposed over the years, but the main idea remains the same. The
Method
In this section, we first introduce the overall framework. Then we present the details of the architecture and the loss functions. Finally, the construction method of the training dataset RISD is presented.
Implementation Details
In the first training stage, following [37], [38], we mix augmentation data of BSDS500 [39] with flipped PASCAL VOC Context dataset [40] to pre-train the perceptual edge branch. Stochastic gradient descent (SGD) minibatch samples 10 images randomly in each iteration. The initial learning rate is set to and divided by 10 after every 10k iterations. Momentum and weight decay are set to be 0.9 and , respectively. For data augmentation, we adopt the same strategy with RCF [37].
In the
Limitations and future work
The proposed EGCNet performs well in a wide range of input scenarios with parallax or object movements. However, it still has some limitations: (1) Our method dose not learn the overall luminance adjustment between the input images, which may cause noticeable brightness transitions under circumstances of large exposure difference(The first row of Figure 16) even though well preserves the structure consistency. (2) When the parallax is too large, or the input pre-registered images barely have
Conclusion
In this paper, we proposed an efficient EGCNet to perform the composition stage for image stitching in an end-to-end learning-based manner. EGCNet takes two unordered pre-registered images as input and learns to predict the blending weights to composite them into an artifact-free panorama. To preserve the structure consistency better, we built a perceptual edge branch to explicitly provide edge guidance and further improve the performance. The edge guidance incorporates in two aspects, one is
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowlgedgments
This work was supported by the Key Project of the National Natural Science Foundation of China under Grant 61731009, the NSFC-RGC under Grant 61961160734, the National Natural Science Foundation of China under Grant 61871185, and the Science Foundation of Shanghai under Grant 20ZR1416200.
Qinyan Dai received the B.Sc. Degree in Computer Science and Technology from East China Normal University, Shanghai, China, in 2019. She is pursuing the masters degree at the same university. Her research interests include image stitching and image restoration.
References (44)
- et al.
Globally consistent alignment for planar mosaicking via topology analysis
Pattern Recognition
(2017) - et al.
Image stitching by line-guided local warping with global similarity constraint
Pattern Recognition
(2018) - et al.
Semeda: Enhancing segmentation precision with semantic edge aware loss
Pattern Recognition
(2020) - et al.
Tiling in interactive panoramic video: Approaches and evaluation
IEEE Transactions on Multimedia
(2016) - Q. Zhao, L. Wan, W. Feng, J. Zhang, Cube2video: Navigate between cubic panoramas in real-time, IEEE Transactions on...
Image alignment and stitching: A tutorial
Foundations & Trends®in Computer Graphics & Vision
(2006)- et al.
As-projective-as-possible image stitching with moving dlt
Proceedings of the IEEE conference on computer vision and pattern recognition
(2013) - et al.
Adaptive as-natural-as-possible image stitching
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015) - et al.
Natural image stitching with the global similarity prior
ECCV
(2016) - et al.
Warping residual based image stitching for large parallax
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2020)
Seamless image stitching of scenes with large motions and exposure differences
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)
Image stitching using structure deformation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Graphcut textures: image and video synthesis using graph cuts
ACM Trans. Graph.
Seam-driven image stitching.
Eurographics (Short Papers)
A multiresolution spline with application to image mosaics
ACM Trans. Graph.
Poisson image editing
ACM Trans. Graph.
Interactive digital photomontage
SIGGRAPH 2004
Seam-driven image stitching
Eurographics
Edge-enhanced optimal seamline detection for orthoimage mosaicking
IEEE Geoscience and Remote Sensing Letters
Parallax-tolerant image stitching
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Seagull: Seam-guided local alignment for parallax-tolerant image stitching
European Conference on Computer Vision
Perception-based seam cutting for image stitching
Signal, Image and Video Processing
Cited by (38)
Bio-inspired multi-level interactive contour detection network
2023, Digital Signal Processing: A Review JournalBLEDNet: Bio-inspired lightweight neural network for edge detection
2023, Engineering Applications of Artificial IntelligenceEG-Unet: Edge-Guided cascaded networks for automated frontal brain segmentation in MR images
2023, Computers in Biology and MedicineLearning to simultaneously enhance field of view and dynamic range for light field imaging
2023, Information FusionCitation Excerpt :Subsequently, Nie et al. [33] also proposed an efficient registration loss to enable unsupervised homography learning, and designed a multi-resolution module to reconstruct perceptually nature results. Dai et al. [34] proposed an edge-guided composition network to generate stitching results with consistent structure. The above methods can be directly employed to stitch LF images, for example, to generate the stitched LF image in a per-view stitching manner.
Image rectangling network based on reparameterized transformer and assisted learning
2024, Scientific Reports
Qinyan Dai received the B.Sc. Degree in Computer Science and Technology from East China Normal University, Shanghai, China, in 2019. She is pursuing the masters degree at the same university. Her research interests include image stitching and image restoration.
Faming Fang received the Ph.D. degree in computer science from East China Normal University, Shanghai, China, in 2013. He is currently a Professor with the School of Computer Science and Technology, East China Normal University. His main research area is image processing using the variational methods, PDEs and deep learning.
Juncheng Li received the B.Sc. degree in Computer Science from Jiangxi Normal University, Nanchang, China, in 2016. He is currently a Ph.D. student with the School of Computer Science and Technology, East China Normal University, Shanghai, China. His main research interests are image processing, machine learning and deep learning.
Guixu Zhang received the Ph.D. degree from the Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, China, in 1998. He is currently a Professor with the Department of Computer Sci ence and Technology, East China Normal University, Shanghai, China. His research interests include hyperspectral remote sensing and image processing.
Aimin Zhou received the B.Sc. and M.Sc. degrees in computer science from Wuhan University, Wuhan, China, in 2001 and 2003, respectively, and the Ph.D. degree in computer science from the University of Essex, Colchester, U.K., in 2009. He is currently a Professor with the School of Computer Science and Technology, East China Normal University, Shanghai, China. He has authored over 70 peer-reviewed papers. His current research interests include machine learning, image processing, and their applications.