Multi-exposure photomontage with hand-held cameras

https://doi.org/10.1016/j.cviu.2020.102929Get rights and content

Highlights

  • Our method abandons the requirement of full registration before fusion.

  • Our method relaxes the condition of inputs and is suitable for more situations.

  • Our method finds good seams to hide misalignments and handle dynamic objects.

  • Our method selects proper constraints to improve final performance.

  • Our method offers the prospect of more extensive applications of image fusion.

Abstract

The paper studies the image fusion from multiple images taken by hand-held cameras with different exposures. Existing methods often generate unsatisfactory results, such as blurring/ghosting artifacts due to the problematic handling of camera motions, dynamic contents, and inappropriately fusion of local regions (e.g., over or under exposed). In addition, they often require a high-quality image registration, which is hard to achieve in scenarios with large depth variations and dynamic textures, and is also time-consuming. In this paper, we propose to enable a rough registration by a single homography and combine the inputs seamlessly to hide any possible misalignment. Specifically, the method first uses a Markov Random Field (MRF) energy for the labeling of all pixels, which assigns different labels to different aligned input images. During the labeling, it chooses well-exposed regions and skips moving objects at the same time. Then, the proposed method combines a Laplacian image according to the labels and constructs the fusion result by solving the Poisson equation. Furthermore, it adds some internal constraints when solving the Poisson equation for balancing and improving fusion results. We present various challenging examples, including static/dynamic, indoor/outdoor and daytime/nighttime scenes, to demonstrate the effectiveness and practicability of the proposed method.

Introduction

High-dynamic-range (HDR) imaging techniques have been increasingly used in consumer electronics, road traffic monitoring, and other industrial, security, or military applications (Darmont, 2012). However, digital cameras often fail to capture the irradiance range that visible to human eyes. It is thus quite significant to explore effective HDR synthesis methods or detailed low dynamic range (LDR) synthesis methods. HDR synthesis methods focus on generating HDR images directly and their results are always tone mapped to LDR images which preserve details better than any of its single exposure counterpart (Debevec and Malik, 1997, Reinhard et al., 2010, Sen et al., 2012, Kalantari, 2017, Wu et al., 2018, Yan et al., 2019). Detailed LDR synthesis methods directly synthesize the result from multi-exposure images (Burt, 1984, Burt and Kolczynski, 1993, Mertens et al., 2007, Wang et al., 2018, Ma et al., 2019). Our method belongs to the detailed LDR synthesis category.

Although the multi-exposure fusion (MEF) approaches have been studied extensively, there are still some drawbacks. For instance, many existing methods have employed some kinds of merging techniques, which assume that multiple exposure images are accurately aligned (Li and Kang, 2012, Li et al., 2013, Paul et al., 2016). Thus, any misalignment due to either camera motions or dynamic contents will lead to the so-called ghosting/blurring artifacts. In the meantime, a Laplacian pyramid reconstruction scheme for image fusion was proposed in Burt and Adelson (1983), which has been widely adopted in many subsequent works (Burt and Kolczynski, 1993, Mertens et al., 2007, Shen et al., 2014). However, the method (Mertens et al., 2007) also requires the inputs to be strictly aligned. For each pixel location, every aligned candidate pixel in the stack contributes to the final pixel value. Thus, if there are any misaligned regions, the fused results would suffer from the ghosting or blurring artifacts. Fig. 1 shows two examples, where the input images are aligned before fusion, but the scenes contain dynamic textures or objects (tree leaves in the left example and moving persons in the right example). The fused results by Mertens et al. (2007) suffer from the blurry (Fig. 1 left) and the ghosting (Fig. 1 right).

Later on, some de-ghosting methods are proposed to handle aforementioned problems (Tursun et al., 2015). Firstly, some methods based on energy optimization are introduced to maintain image consistency or distinguish different parameters (Jinno and Okuda, 2008, Granados et al., 2013). Secondly, some flow-based methods realize registration with pixel-level accuracy and are effective for aligning moving objects between two images (Kang et al., 2003, Zimmer et al., 2011, Kalantari, 2017). Thirdly, some patch-based methods (Sen et al., 2012, Hu et al., 2013) are proposed to reconstruct the input images by patch-based synthesis according to one selected reference image, to form a fully registered image stack. The full alignment means that the reconstruction compensates both the camera and the scene motions. The synthesized candidates are then sent to the fusion framework. However, the patch-based reconstruction is not always robust in complicated situations, especially when encountered with dynamic textures (e.g., fountains, waterfalls, tree leaves in the wind), or structured regions. Fig. 2 shows such an example, where two patch-based methods generate blurry results in tree crown regions.,

High quality fully registration is challenging. For one thing, it is difficult to achieve a high-quality alignment under different appearances (Cui et al., 2017). For another, large foreground (Zhang et al., 2016) and near-range objects (Liu et al., 2016) would complicate the alignments, and scenes with large depth variations cannot be registered by a single homography, or by more sophisticated models (Lin et al., 2017). Besides, non-parametric approaches such as optical flows tend to generate errors at discontinuous depth boundaries (Kalantari, 2017) and the patch based reconstruction is also prone to produce errors as shown in Fig. 2.

To pursue a robust solution, the proposed method abandons the requirement of full alignment and replaces it by a rough registration with a single homography. As such, the photomontage idea proposed by Agarwala et al. (2004) is applied to compose the multi-exposure images that have been aligned roughly. However, our setting is different from Agarwala et al. (2004) in two aspects. First, Agarwala et al. generate composites interactively, which combines parts of a set of photographs into a single composite picture. Users select preferred image regions (e.g., a region containing a smiling face) at different pictures. In contrast, our solution is fully automatic because we combine image parts according to their exposure qualities. Second, the combined photos of Agarwala et al. (2004) were captured by a static tripod, whereas our inputs are captured by hand-held cameras. In our implementation, the method does not require the perfect registration, as long as it finds good seams to hide the misalignment.

The proposed method consists of some specific components as follows. It selects sub-image regions from different roughly aligned exposures by an MRF labeling and combines them seamlessly in the gradient domain. In this way, each pixel value belongs to a single image such that it is possible to maintain details well and handle blurring effectively. Moreover, it considers the dynamic identification and exposure selection in the MRF optimization simultaneously. The selected regions are not only well-exposed but are free from the interferences of dynamic objects/textures. Overall, the main contributions are:

(1) The proposed method relaxes the conditions of inputs. Conventional image alignment algorithms always fail to align inputs from hand-held cameras with large shaking. The proposed method abandons the requirement of full registrations, which can handle various complicated inputs and generate high-quality fusion results.

(2) The proposed method introduces the dynamic exclusion technique to handle moving objects. An energy optimization is first applied to detect moving objects and then a mask is generated to identify the dynamic pixels of each input, which reflects the probabilities of pixels being static or dynamic. The final results are free from ghosting with proper exposure.

(3) We propose to add some internal constraints to lighten under-exposed regions.

(4) We conduct comprehensive comparisons to demonstrate the effectiveness of our method, including objective assessment, visual comparison, complexity comparison and subjective evaluation.

Section snippets

Related works

HDR images can be constructed by either directly capturing from special hardware (Nayar and Mitsunaga, 2000, Tocci et al., 2011), or synthesizing from multiple low dynamic range (LDR) images at different exposure levels using camera response function (CRF) (Mitsunaga and Nayar, 1999, Grossberg and Nayar, 2003), and then applying tone mapping (Fattal et al., 2002, Rana et al., 2018) to display (Mann and Picard, 1995, Debevec and Malik, 1997). MEF methods have become the most frequently used

Method

The input images are captured by hand-held cameras with varying exposures. The first step is to align them for motion compensation. By default, the image with median exposure is picked as the target, to which the other images are aligned. Slight misaligned errors could be tolerated in our implementation. We choose the Features from Accelerated Segment Test (FAST) (Rosten and Drummond, 2006) for the feature detection and track them by the Kanade–Lucas–Tomasi (KLT) (Shi and Tomasi, 1994).

Experiments

We assemble a comprehensive dataset of 135 groups of multi-exposure image sequences from previous publications, Internet and our own captures, ranging from daytime–nighttime, static–dynamic, and outdoor–indoor. Based on the dataset, we conduct comprehensive experiments to verify the performance of our method, including objective assessment, visual comparison, complexity comparison and subjective evaluation. Several methods that are just suitable for static inputs (Li et al., 2013, Li and Kang,

Conclusion

We have presented a method for accurately fusing multi-exposure images captured by hand-held cameras. In image fusion, high-quality image registration is hard to achieve when scenes have large depth variations and dynamic textures. The proposed method does not require high-quality registration before fusion. It selects well-exposed regions and detects dynamic objects from roughly aligned images using MRF energy minimization. Then, the method finds good seams to hide misalignment when solving

CRediT authorship contribution statement

Ru Li: Conceptualization, Methodology, Software, Validation, Investigation, Writing - original draft, Visualization. Shuaicheng Liu: Formal analysis, Writing - review & editing, Project administration. Guanghui Liu: Writing - review & editing, Supervision. Tiecheng Sun: Writing - review & editing. Jishun Guo: Writing - review & editing.

Acknowledgements

This research was supported in part by National Natural Science Foundation of China (NSFC) under Grants: 61872067 and 61720106004, in part by Department of Science and Technology of Sichuan Province under Grant 2019YFH0016.

References (64)

  • DarmontA.

    High Dynamic Range Imaging: Sensors and Architectures

    (2012)
  • DebevecP.E. et al.

    Recovering high dynamic range radiance maps from photographs

    ACM Trans. Graph.

    (1997)
  • Eden, A., Uyttendaele, M., Szeliski, R., 2006. Seamless image stitching of scenes with large motions and exposure...
  • FattalR. et al.

    Gradient domain high dynamic range compression

    ACM Trans. Graph.

    (2002)
  • Gallo, O., Gelfandz, N., Chen, W.-C., Tico, M., Pulli, K., 2009. Artifact-free high dynamic range imaging. In: Proc....
  • GranadosM. et al.

    Automatic noise modeling for ghost-free HDR reconstruction

    ACM Trans. Graph.

    (2013)
  • GrossbergM.D. et al.

    Determining the camera response from images: What is knowable?

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • GuoH. et al.

    Joint video stitching and stabilization from moving cameras

    IEEE Trans. Image Process.

    (2016)
  • HeK. et al.

    Guided image filtering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • HossnyM. et al.

    Comments on ’Information measure for performance of image fusion’

    Electron. Lett.

    (2008)
  • Hu, J., Gallo, O., Pulli, K., Sun, X., 2013. HDR deghosting: How to deal with saturation? In: Proc. CVPR. pp....
  • JiaJ. et al.

    Drag-and-drop pasting

    ACM Trans. Graph.

    (2006)
  • Jinno, T., Okuda, M., 2008. Motion blur free HDR image acquisition using multiple exposures. In: Proc. ICIP. pp....
  • KalantariN.K.

    Deep high dynamic range imaging of dynamic scenes

    ACM Trans. Graph.

    (2017)
  • KalantariN. et al.

    Patch-based high dynamic range video

    ACM Trans. Graph.

    (2013)
  • KangS.B. et al.

    High dynamic range video

    ACM Trans. Graph.

    (2003)
  • LeeC. et al.

    Ghost-free high dynamic range imaging via rank minimization

    IEEE Signal Process. Lett.

    (2014)
  • Levin, A., Zomet, A., Peleg, S., Weiss, Y., 2004. Seamless image stitching in the gradient domain. In: Proc. ECCV. pp....
  • LiS. et al.

    Fast multi-exposure image fusion with median filter and recursive filter

    IEEE Trans. Consum. Electron.

    (2012)
  • LiS. et al.

    Image fusion with guided filtering

    IEEE Trans. Image Process.

    (2013)
  • LiY. et al.

    Lazy snapping

    ACM Trans. Graph.

    (2004)
  • Li, H., Zhang, L., 2018. Multi-exposure fusion with CNN features. In: Proc. ICIP....
  • Cited by (6)

    • Multi-exposure image fusion based on linear embeddings and watershed masking

      2021, Signal Processing
      Citation Excerpt :

      A guided filter is then applied to these weights to eliminate the noise and discontinuities, followed by an adopted pyramid decomposition for fusion. After presenting their potential with several image processing and machine learning applications, convolutional neural networks (CNNs) are employed in many MEF studies, e.g., Li and Zhang [7], Que and Lee [14], Liu and Leung [15], Hu et al. [16], Li et al. [17]. For instance, several pretrained classification, super-resolution and denoising networks are compared in [7] in terms of their feature extraction capacity for MEF applications.

    • UPHDR-GAN: Generative Adversarial Network for High Dynamic Range Imaging With Unpaired Data

      2022, IEEE Transactions on Circuits and Systems for Video Technology
    • Image Style Transfer with Generative Adversarial Networks

      2021, MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.cviu.2020.102929.

    View full text