Edge-guided Composition Network for Image Stitching

doi:10.1016/j.patcog.2021.108019

Pattern Recognition

Volume 118, October 2021, 108019

https://doi.org/10.1016/j.patcog.2021.108019 Get rights and content

Highlights

•
We propose a novel deep learning framework for composition in image stitching.
•
We illustrate the importance of structure consistency preserving in image composition, and leverage the structure prior provided by a proposed perceptual edge branch to further enhance the composition performance.
•
We build a Real Image Stitching Dataset (RISD), which is the first general-purpose benchmark for image stitching training.

Abstract

Panorama creation is still challenging in consumer-level photography because of varying conditions of image capturing. A long-standing problem is the presence of artifacts caused by structure inconsistent image transitions. Since it is difficult to achieve perfect alignment in unconstrained shooting environment especially with parallax and object movements, image composition becomes a crucial step to produce artifact-free stitching results. Current energy-based seam-cutting image composition approaches are limited by the hand-crafted features, which are not discriminative and adaptive enough to robustly create structure consistent image transitions. In this paper, we present the first end-to-end deep learning framework named Edge Guided Composition Network (EGCNet) for the composition stage in image stitching. We cast the whole composition stage as an image blending problem, and aims to regress the blending weights to seamlessly produce the stitched image. To better preserve the structure consistency, we exploit perceptual edges to guide the network with additional geometric prior. Specifically, we introduce a perceptual edge branch to integrate edge features into the model and propose two edge-aware losses for edge guidance. Meanwhile, we gathered a general-purpose dataset for image stitching training and evaluation (namely, RISD). Extensive experiments demonstrate that our EGCNet produces plausible results with less running time, and outperforms traditional methods especially under the circumstances of parallax and object motions.

Introduction

Image stitching plays an important role in many applications like remote image mosaicking [1], panoramic videos [2], virtual reality [3], and panorama creation software. Conventionally, image stitching is a process of combining multiple images with overlapped areas to produce a wide-view panorama [4]. There are two ways of mitigating the artifacts on the stitched image: (1)improving alignment by increasing the available degrees of freedom [5], [6], [7], [8], [9]; (2) hiding misalignments by image composition techniques such as selecting better seams in the overlapped areas. However, artifacts caused by large parallax and objects moving can hardly be concealed by better alignment, thus improving the image composition is the only remedy in these cases.

The most widely used composition technique for image stitching is seam-cutting [10], [11], [12], [13] and blending [14], [15]. The former aims at finding an invisible seam in the overlapped region of the aligned images and the latter is to smooth the transitions between images near the seam. Seam-cutting can be viewed as a ${0, 1}$ labeling problem that creates a mapping between pixels in the composite and source images. The optimal seam is found by minimizing an energy function that describes the differences between images in the overlapped region. Many efforts have been devoted to improving seam-cutting performance by penalizing the photometric difference using various energy functions. The euclidean-metric color difference and gradient difference [12], [16], [17] are the earliest attempt. Later, other features like texture complexity [18], canny edges [19], [20], and saliency [21] were also taken into consideration. However, the performance of these methods is limited by the weak hand-crafted features. Because of the difficulty to find the best combination and balance the weight of these features, traditional methods often lack of adaptively to various input conditions. Moreover, these hand-crafted features only gather very local information and lack of stronger geometric information guidance, thus may produce structure inconsistency results, such as tearing or cropping recognizable objects (Figure 1). In terms of computation time, the energy-based composition methods all have high time complexity by solving with min-cut or dynamic programming, which is far from being a real-time application. Therefore, it is highly desirable to have a more efficient image composition method that is robust to varying input conditions and capable of utilizing features adaptively.

Recently, convolutional neural networks (CNNs) has greatly promoted the development of computer vision. With the help of CNNs, more discriminative and adaptive image features can be extracted, and more robust performance could be reached with the data-driven method. To the best of our knowledge, the only learning-based work related in image/video stitching is by Lai et al. [22], which learns a smooth spatial interpolation between input videos. Different from theirs, we focus on the image composition process and cast the whole process as a blending problem. To seamlessly blend the pre-registered images into an artifacts-free stitched image, we leverage CNNs to develop an Edge-guided Composition Network (EGCNet) to regress the blending weights. The insight is that the pre-registered images should be blended at regions with high similarity of them to avoid inconsistent and noticeable image transitions. To learn this composition nature better, we use Siamese network [23] for features extraction, and combine the feature maps not only through concatenation but also a group-wise correlation layer to explicitly provide similarity measurement, so the following layers can predict blending weights from the combined features. Our work can be seen as continuing the second line of image stitching research.

Training such a network to predict generic composition for image stitching requires a sufficiently large training set. However, the existing image stitching datasets are all too small for effective training. Thus, to provide sufficient training data, we build a Real Image Stitching Dataset (RISD) of 500 pairs of images with various parallax levels and scene types. Each sample of RISD contains a pre-registered image pair and a visually plausible composite image as ground-truth. With the help of RISD, our EGCNet can be trained in an end-to-end manner. Meanwhile, since preserving structure consistency is of great significance in the composition process, we explicitly exploit the perceptual edges as guidance. To achieve this, we build a two-branch network with a composition branch and a perceptual edge branch. The perceptual edge branch detects edges of the input pre-registered images and guides the composition branch in two aspects. On the one hand, edge features generated in the perceptual edge branch are embedded into the composite branch to provide structure prior. On the other hand, edge maps produced by the perceptual edge branch are utilized in our proposed edge-aware smoothness loss and detail-preserved loss, which guide the model to pay more attention to regions with strong structural information. The whole model is trained in a joint way of supervision and self-supervision, which is supervised by the ground-truth composite images in RISD and self-supervised by the edge-aware losses that were designed based on the task characteristics.

In summary, our main contributions are as following:

1.
We propose a novel end-to-end deep learning framework for composition in image stitching.
2.
We illustrate the importance of structure consistency preserving in image composition, and leverage the structure prior provided by a proposed perceptual edge branch to further enhance the composition performance.
3.
We build a Real Image Stitching Dataset (RISD), which is the first general-purpose benchmark for image stitching training.

Section snippets

Traditional Composition Methods for Image Stitching

After warping the images to align with each other, the simplest way to reduce unnatural transitions between images is smoothing the overlapped region such as feathering and center-weighting [4], [16], [24]. These methods take a weighted average value at each pixel, but these methods do not work well since illumination differences, misalignments, and blurring are still often visible.

Better image composition algorithms have been proposed over the years, but the main idea remains the same. The

Method

In this section, we first introduce the overall framework. Then we present the details of the architecture and the loss functions. Finally, the construction method of the training dataset RISD is presented.

Implementation Details

In the first training stage, following [37], [38], we mix augmentation data of BSDS500 [39] with flipped PASCAL VOC Context dataset [40] to pre-train the perceptual edge branch. Stochastic gradient descent (SGD) minibatch samples 10 images randomly in each iteration. The initial learning rate is set to $10^{- 4}$ and divided by 10 after every 10k iterations. Momentum and weight decay are set to be 0.9 and $2 \times 10^{- 4}$ , respectively. For data augmentation, we adopt the same strategy with RCF [37].

In the

Limitations and future work

The proposed EGCNet performs well in a wide range of input scenarios with parallax or object movements. However, it still has some limitations: (1) Our method dose not learn the overall luminance adjustment between the input images, which may cause noticeable brightness transitions under circumstances of large exposure difference(The first row of Figure 16) even though well preserves the structure consistency. (2) When the parallax is too large, or the input pre-registered images barely have

Conclusion

In this paper, we proposed an efficient EGCNet to perform the composition stage for image stitching in an end-to-end learning-based manner. EGCNet takes two unordered pre-registered images as input and learns to predict the blending weights to composite them into an artifact-free panorama. To preserve the structure consistency better, we built a perceptual edge branch to explicitly provide edge guidance and further improve the performance. The edge guidance incorporates in two aspects, one is

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowlgedgments

This work was supported by the Key Project of the National Natural Science Foundation of China under Grant 61731009, the NSFC-RGC under Grant 61961160734, the National Natural Science Foundation of China under Grant 61871185, and the Science Foundation of Shanghai under Grant 20ZR1416200.

Qinyan Dai received the B.Sc. Degree in Computer Science and Technology from East China Normal University, Shanghai, China, in 2019. She is pursuing the masters degree at the same university. Her research interests include image stitching and image restoration.

References (44)

M. Xia et al.
Globally consistent alignment for planar mosaicking via topology analysis
Pattern Recognition
(2017)
T. Xiang et al.
Image stitching by line-guided local warping with global similarity constraint
Pattern Recognition
(2018)
Y. Chen et al.
Semeda: Enhancing segmentation precision with semantic edge aware loss
Pattern Recognition
(2020)
V.R. Gaddam et al.
Tiling in interactive panoramic video: Approaches and evaluation
IEEE Transactions on Multimedia
(2016)
Q. Zhao, L. Wan, W. Feng, J. Zhang, Cube2video: Navigate between cubic panoramas in real-time, IEEE Transactions on...
R. Szeliski
Image alignment and stitching: A tutorial
Foundations & Trends®in Computer Graphics & Vision
(2006)
J. Zaragoza et al.
As-projective-as-possible image stitching with moving dlt
Proceedings of the IEEE conference on computer vision and pattern recognition
(2013)
C.-C. Lin et al.
Adaptive as-natural-as-possible image stitching
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015)
Y.-S. Chen et al.
Natural image stitching with the global similarity prior
ECCV
(2016)
K.-Y. Lee et al.
Warping residual based image stitching for large parallax
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
(2020)

A. Eden et al.

Seamless image stitching of scenes with large motions and exposure differences

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

(2006)

J. Jia et al.

Image stitching using structure deformation

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2008)

V. Kwatra et al.

Graphcut textures: image and video synthesis using graph cuts

ACM Trans. Graph.

(2003)

J. Gao et al.

Seam-driven image stitching.

Eurographics (Short Papers)

(2013)

P.J. Burt et al.

A multiresolution spline with application to image mosaics

ACM Trans. Graph.

(1983)

P. Pérez et al.

Poisson image editing

ACM Trans. Graph.

(2003)

A. Agarwala et al.

Interactive digital photomontage

SIGGRAPH 2004

(2004)

J. Gao et al.

Seam-driven image stitching

Eurographics

(2013)

L. Li et al.

Edge-enhanced optimal seamline detection for orthoimage mosaicking

IEEE Geoscience and Remote Sensing Letters

(2018)

F. Zhang et al.

Parallax-tolerant image stitching

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2014)

K. Lin et al.

Seagull: Seam-guided local alignment for parallax-tolerant image stitching

European Conference on Computer Vision

(2016)

N. Li et al.

Perception-based seam cutting for image stitching

Signal, Image and Video Processing

(2018)

Cited by (38)

Single-particle reconstruction in cryo-EM based on three-dimensional weighted nuclear norm minimization
2023, Pattern Recognition
Single-particle reconstruction (SPR) in cryogenic electron microscopy (cryo-EM) aims at aligning and averaging two-dimensional micrographs to reconstruct a three-dimensional particle.
How to reconstruct micrographs from heavy noise is a crucial point for achieving better micrograph quality, and thus many methods focus on noise removal. However, new problems such as over-smoothing often occur in their results due to failure in handling heavy noise well. This paper proposes a three-dimensional weighted nuclear norm minimization (3DWNNM) model for SPR in the cryo-EM task to address these issues. Specifically, we design a minimization solver based on the forward-backward splitting algorithm to tackle our model efficiently. Under certain conditions, this solution has an energy-decaying feature and performs exceptionally well in reconstruction. Numerical experiments fully demonstrate the effectiveness and the robustness of the proposed method.
Bio-inspired multi-level interactive contour detection network
2023, Digital Signal Processing: A Review Journal
Encoding-decoding convolutional neural network structures have shown powerful performance in contour detection tasks, however, the design of models in the past seems to have little reference to physiological mechanisms, especially biological visual mechanisms. And physiological studies have shown that the visual system is efficient and accurate in the extraction of contour features. Meanwhile, when designing network models, researchers pay more attention to the impact of decoding networks on the performance of the models and pay little attention to the impact of encoding networks on the performance of the models. Therefore, in this paper, inspired by the effective mechanism of the biological vision system for contour detection, we combine it with convolutional neural net and self-attention mechanism to design a multi-level interaction contour detection model. We call it MI-Net. Among them, the multi-level interaction module is designed to simulate the transmission mechanism of visual information across the visual cortex and the feedback mechanism of visual information to realize the multi-level interaction of different levels of the encoding network, and then optimize each visual level to achieve the improvement of the performance of contour detection. The feature integration module is designed by simulating the feature integration function of the inferior temporal (IT) cortex. Multiple feature integration modules are combined into a decoding network to increase the feature integration capability of the decoding network. Experiments on publicly available datasets show that our proposed contour detection network has good performance.
BLEDNet: Bio-inspired lightweight neural network for edge detection
2023, Engineering Applications of Artificial Intelligence
Edge detection is fundamental to advanced computer vision tasks. Although deep learning-based methods can generate excellent results, they tend to be computationally expensive. To address this issue, researchers have explored the development of lightweight CNNs. Physiological studies have shown that there are two visual pathways in the biological visual system. The second visual pathway can modulate the first visual pathway, aiding the visual cortex in extracting edges from images more quickly. The inferior temporal cortex can integrate the edge signals extracted by the visual pathways and generate edges. Inspired by this, we designed a bio-inspired lightweight edge detection CNN, which includes an Encoder and a Decoder. The Encoder consists of a bio-inspired first visual pathway network and a bio-inspired second visual pathway network. The Decoder consists of a compact receptive field enhanced network. Specifically, the bio-inspired first visual pathway network can calculate the rate of change of image pixels. Through adaptive antagonism, signals with different rates of change are mutually suppressed. Meanwhile, the bio-inspired second visual pathway network can guide the bio-inspired first visual pathway network to focus more on the salient parts of the image. In addition, the receptive field-enhanced network can perceive image features distributed along specific directions, helping the decoder generate clearer edges. Our proposed method achieves competitive results on the BSDS500 dataset with ODS=0.805 and on the NYUD-v2 dataset with ODS = 0.757, while also having lower computational costs than most existing methods. Therefore, our approach is suitable for devices with lower computing capabilities.
EG-Unet: Edge-Guided cascaded networks for automated frontal brain segmentation in MR images
2023, Computers in Biology and Medicine
Accurate segmentation of frontal lobe areas on magnetic resonance imaging (MRI) can assist in diagnosing and managing idiopathic normal-pressure hydrocephalus. However, frontal lobe segmentation is challenging due to the complexity of the degree and shape of damage and the ambiguity of the boundaries of frontal lobe sites. Therefore, to extract the rich edge information and feature representation of the frontal lobe, this paper designs an edge guidance (EG) module to enhance the representation of edge features. Accordingly, an edge-guided cascade network framework (EG-Net) is proposed to segment frontal lobe parts automatically. Two-dimensional MRI slice images are fed into the edge generation and segmentation networks. First, the edge generation network extracts the edge information from the input image. Then, the edge information is sent to the EG module to generate an edge attention map for feature representation enhancement. Meanwhile, multi-scale attentional convolution (MSA) is utilized in the feature coding stage of the segmentation network to obtain feature responses from different perceptual fields in the coding stage and enrich the spatial context information. Besides, the feature fusion module is employed to selectively aggregate the multi-scale features in the coding stage with the edge features output by the EG module. Finally, the two components are fused, and a decoder recovers the spatial information to generate the final prediction results. An extensive quantitative comparison is performed on a publicly available brain MRI dataset (MICCAI 2012) to evaluate the effectiveness of the proposed algorithm. The experimental results indicate that the proposed method achieves an average DICE score of 95.77% compared to some advanced methods, which is 4.96% better than the classical U-Net. The results demonstrate the potential of the proposed EG-Net in improving the accuracy of frontal edge pixel classification through edge guidance.
Learning to simultaneously enhance field of view and dynamic range for light field imaging
2023, Information Fusion
Citation Excerpt :
Subsequently, Nie et al. [33] also proposed an efficient registration loss to enable unsupervised homography learning, and designed a multi-resolution module to reconstruct perceptually nature results. Dai et al. [34] proposed an edge-guided composition network to generate stitching results with consistent structure. The above methods can be directly employed to stitch LF images, for example, to generate the stitched LF image in a per-view stitching manner.
Light field (LF) imaging, which simultaneously captures the intensity and direction information of light rays, enabling many vision applications, has received widespread attention. However, limited by the optical structure of the LF camera, the acquired LF images usually suffer from narrow field of view (FOV) and low dynamic range. To address these problems, this paper proposes an unsupervised wide-FOV high dynamic range (HDR) LF imaging method, which can effectively reconstruct a wide-FOV HDR LF image from a set of source LF images captured from different perspectives and simultaneously with different exposures. Specifically, the proposed method first exploits tensor decomposition to obtain a compact representation of high-dimensional LF image, so as to enable a computationally efficient 2D neural network for LF registration. Subsequently, an exposure restoration network is constructed to recover the multi-exposure information of the registered non-overlapping regions, which is then linearly fused with the previous registered results to generate the stitched wide-FOV multi-exposure LF images. Finally, an HDR LF blending network with two ingenious unsupervised losses is designed to blend the stitching results to generate the desired wide-FOV HDR LF image. Experimental results show that the proposed method achieves superior performance compared with the state-of-the-art methods in both qualitative and quantitative evaluation. Moreover, a series of ablation studies effectively validate the performance of each module in the proposed method.
Image rectangling network based on reparameterized transformer and assisted learning
2024, Scientific Reports

View all citing articles on Scopus

Faming Fang received the Ph.D. degree in computer science from East China Normal University, Shanghai, China, in 2013. He is currently a Professor with the School of Computer Science and Technology, East China Normal University. His main research area is image processing using the variational methods, PDEs and deep learning.

Juncheng Li received the B.Sc. degree in Computer Science from Jiangxi Normal University, Nanchang, China, in 2016. He is currently a Ph.D. student with the School of Computer Science and Technology, East China Normal University, Shanghai, China. His main research interests are image processing, machine learning and deep learning.

Guixu Zhang received the Ph.D. degree from the Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou, China, in 1998. He is currently a Professor with the Department of Computer Sci ence and Technology, East China Normal University, Shanghai, China. His research interests include hyperspectral remote sensing and image processing.

Aimin Zhou received the B.Sc. and M.Sc. degrees in computer science from Wuhan University, Wuhan, China, in 2001 and 2003, respectively, and the Ph.D. degree in computer science from the University of Essex, Colchester, U.K., in 2009. He is currently a Professor with the School of Computer Science and Technology, East China Normal University, Shanghai, China. He has authored over 70 peer-reviewed papers. His current research interests include machine learning, image processing, and their applications.

^☆: Fully documented templates are available in the elsarticle package on CTAN.

View full text

Edge-guided Composition Network for Image Stitching☆

Highlights

Abstract

Introduction

Section snippets

Traditional Composition Methods for Image Stitching

Method

Implementation Details

Limitations and future work

Conclusion

Declaration of Competing Interest

Acknowlgedgments

Pattern Recognition

Pattern Recognition

Pattern Recognition

Tiling in interactive panoramic video: Approaches and evaluation

IEEE Transactions on Multimedia

Image alignment and stitching: A tutorial

Foundations & Trends®in Computer Graphics & Vision

As-projective-as-possible image stitching with moving dlt

Proceedings of the IEEE conference on computer vision and pattern recognition

Adaptive as-natural-as-possible image stitching

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Natural image stitching with the global similarity prior

ECCV

Warping residual based image stitching for large parallax

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Seamless image stitching of scenes with large motions and exposure differences

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06)

Image stitching using structure deformation

IEEE Transactions on Pattern Analysis and Machine Intelligence

Graphcut textures: image and video synthesis using graph cuts

ACM Trans. Graph.

Seam-driven image stitching.

Eurographics (Short Papers)

A multiresolution spline with application to image mosaics

ACM Trans. Graph.

Poisson image editing

ACM Trans. Graph.

Interactive digital photomontage

SIGGRAPH 2004

Seam-driven image stitching

Eurographics

Edge-enhanced optimal seamline detection for orthoimage mosaicking

IEEE Geoscience and Remote Sensing Letters

Parallax-tolerant image stitching

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Seagull: Seam-guided local alignment for parallax-tolerant image stitching

European Conference on Computer Vision

Perception-based seam cutting for image stitching

Signal, Image and Video Processing