Multi-exposure photomontage with hand-held cameras

doi:10.1016/j.cviu.2020.102929

Computer Vision and Image Understanding

Volume 193, April 2020, 102929

https://doi.org/10.1016/j.cviu.2020.102929 Get rights and content

Highlights

•
Our method abandons the requirement of full registration before fusion.
•
Our method relaxes the condition of inputs and is suitable for more situations.
•
Our method finds good seams to hide misalignments and handle dynamic objects.
•
Our method selects proper constraints to improve final performance.
•
Our method offers the prospect of more extensive applications of image fusion.

Abstract

The paper studies the image fusion from multiple images taken by hand-held cameras with different exposures. Existing methods often generate unsatisfactory results, such as blurring/ghosting artifacts due to the problematic handling of camera motions, dynamic contents, and inappropriately fusion of local regions (e.g., over or under exposed). In addition, they often require a high-quality image registration, which is hard to achieve in scenarios with large depth variations and dynamic textures, and is also time-consuming. In this paper, we propose to enable a rough registration by a single homography and combine the inputs seamlessly to hide any possible misalignment. Specifically, the method first uses a Markov Random Field (MRF) energy for the labeling of all pixels, which assigns different labels to different aligned input images. During the labeling, it chooses well-exposed regions and skips moving objects at the same time. Then, the proposed method combines a Laplacian image according to the labels and constructs the fusion result by solving the Poisson equation. Furthermore, it adds some internal constraints when solving the Poisson equation for balancing and improving fusion results. We present various challenging examples, including static/dynamic, indoor/outdoor and daytime/nighttime scenes, to demonstrate the effectiveness and practicability of the proposed method.

Introduction

High-dynamic-range (HDR) imaging techniques have been increasingly used in consumer electronics, road traffic monitoring, and other industrial, security, or military applications (Darmont, 2012). However, digital cameras often fail to capture the irradiance range that visible to human eyes. It is thus quite significant to explore effective HDR synthesis methods or detailed low dynamic range (LDR) synthesis methods. HDR synthesis methods focus on generating HDR images directly and their results are always tone mapped to LDR images which preserve details better than any of its single exposure counterpart (Debevec and Malik, 1997, Reinhard et al., 2010, Sen et al., 2012, Kalantari, 2017, Wu et al., 2018, Yan et al., 2019). Detailed LDR synthesis methods directly synthesize the result from multi-exposure images (Burt, 1984, Burt and Kolczynski, 1993, Mertens et al., 2007, Wang et al., 2018, Ma et al., 2019). Our method belongs to the detailed LDR synthesis category.

Although the multi-exposure fusion (MEF) approaches have been studied extensively, there are still some drawbacks. For instance, many existing methods have employed some kinds of merging techniques, which assume that multiple exposure images are accurately aligned (Li and Kang, 2012, Li et al., 2013, Paul et al., 2016). Thus, any misalignment due to either camera motions or dynamic contents will lead to the so-called ghosting/blurring artifacts. In the meantime, a Laplacian pyramid reconstruction scheme for image fusion was proposed in Burt and Adelson (1983), which has been widely adopted in many subsequent works (Burt and Kolczynski, 1993, Mertens et al., 2007, Shen et al., 2014). However, the method (Mertens et al., 2007) also requires the inputs to be strictly aligned. For each pixel location, every aligned candidate pixel in the stack contributes to the final pixel value. Thus, if there are any misaligned regions, the fused results would suffer from the ghosting or blurring artifacts. Fig. 1 shows two examples, where the input images are aligned before fusion, but the scenes contain dynamic textures or objects (tree leaves in the left example and moving persons in the right example). The fused results by Mertens et al. (2007) suffer from the blurry (Fig. 1 left) and the ghosting (Fig. 1 right).

Later on, some de-ghosting methods are proposed to handle aforementioned problems (Tursun et al., 2015). Firstly, some methods based on energy optimization are introduced to maintain image consistency or distinguish different parameters (Jinno and Okuda, 2008, Granados et al., 2013). Secondly, some flow-based methods realize registration with pixel-level accuracy and are effective for aligning moving objects between two images (Kang et al., 2003, Zimmer et al., 2011, Kalantari, 2017). Thirdly, some patch-based methods (Sen et al., 2012, Hu et al., 2013) are proposed to reconstruct the input images by patch-based synthesis according to one selected reference image, to form a fully registered image stack. The full alignment means that the reconstruction compensates both the camera and the scene motions. The synthesized candidates are then sent to the fusion framework. However, the patch-based reconstruction is not always robust in complicated situations, especially when encountered with dynamic textures (e.g., fountains, waterfalls, tree leaves in the wind), or structured regions. Fig. 2 shows such an example, where two patch-based methods generate blurry results in tree crown regions.,

High quality fully registration is challenging. For one thing, it is difficult to achieve a high-quality alignment under different appearances (Cui et al., 2017). For another, large foreground (Zhang et al., 2016) and near-range objects (Liu et al., 2016) would complicate the alignments, and scenes with large depth variations cannot be registered by a single homography, or by more sophisticated models (Lin et al., 2017). Besides, non-parametric approaches such as optical flows tend to generate errors at discontinuous depth boundaries (Kalantari, 2017) and the patch based reconstruction is also prone to produce errors as shown in Fig. 2.

To pursue a robust solution, the proposed method abandons the requirement of full alignment and replaces it by a rough registration with a single homography. As such, the photomontage idea proposed by Agarwala et al. (2004) is applied to compose the multi-exposure images that have been aligned roughly. However, our setting is different from Agarwala et al. (2004) in two aspects. First, Agarwala et al. generate composites interactively, which combines parts of a set of photographs into a single composite picture. Users select preferred image regions (e.g., a region containing a smiling face) at different pictures. In contrast, our solution is fully automatic because we combine image parts according to their exposure qualities. Second, the combined photos of Agarwala et al. (2004) were captured by a static tripod, whereas our inputs are captured by hand-held cameras. In our implementation, the method does not require the perfect registration, as long as it finds good seams to hide the misalignment.

The proposed method consists of some specific components as follows. It selects sub-image regions from different roughly aligned exposures by an MRF labeling and combines them seamlessly in the gradient domain. In this way, each pixel value belongs to a single image such that it is possible to maintain details well and handle blurring effectively. Moreover, it considers the dynamic identification and exposure selection in the MRF optimization simultaneously. The selected regions are not only well-exposed but are free from the interferences of dynamic objects/textures. Overall, the main contributions are:

(1) The proposed method relaxes the conditions of inputs. Conventional image alignment algorithms always fail to align inputs from hand-held cameras with large shaking. The proposed method abandons the requirement of full registrations, which can handle various complicated inputs and generate high-quality fusion results.

(2) The proposed method introduces the dynamic exclusion technique to handle moving objects. An energy optimization is first applied to detect moving objects and then a mask is generated to identify the dynamic pixels of each input, which reflects the probabilities of pixels being static or dynamic. The final results are free from ghosting with proper exposure.

(3) We propose to add some internal constraints to lighten under-exposed regions.

(4) We conduct comprehensive comparisons to demonstrate the effectiveness of our method, including objective assessment, visual comparison, complexity comparison and subjective evaluation.

Section snippets

Related works

HDR images can be constructed by either directly capturing from special hardware (Nayar and Mitsunaga, 2000, Tocci et al., 2011), or synthesizing from multiple low dynamic range (LDR) images at different exposure levels using camera response function (CRF) (Mitsunaga and Nayar, 1999, Grossberg and Nayar, 2003), and then applying tone mapping (Fattal et al., 2002, Rana et al., 2018) to display (Mann and Picard, 1995, Debevec and Malik, 1997). MEF methods have become the most frequently used

Method

The input images are captured by hand-held cameras with varying exposures. The first step is to align them for motion compensation. By default, the image with median exposure is picked as the target, to which the other images are aligned. Slight misaligned errors could be tolerated in our implementation. We choose the Features from Accelerated Segment Test (FAST) (Rosten and Drummond, 2006) for the feature detection and track them by the Kanade–Lucas–Tomasi (KLT) (Shi and Tomasi, 1994).

Experiments

We assemble a comprehensive dataset of 135 groups of multi-exposure image sequences from previous publications, Internet and our own captures, ranging from daytime–nighttime, static–dynamic, and outdoor–indoor. Based on the dataset, we conduct comprehensive experiments to verify the performance of our method, including objective assessment, visual comparison, complexity comparison and subjective evaluation. Several methods that are just suitable for static inputs (Li et al., 2013, Li and Kang,

Conclusion

We have presented a method for accurately fusing multi-exposure images captured by hand-held cameras. In image fusion, high-quality image registration is hard to achieve when scenes have large depth variations and dynamic textures. The proposed method does not require high-quality registration before fusion. It selects well-exposed regions and detects dynamic objects from roughly aligned images using MRF energy minimization. Then, the method finds good seams to hide misalignment when solving

CRediT authorship contribution statement

Ru Li: Conceptualization, Methodology, Software, Validation, Investigation, Writing - original draft, Visualization. Shuaicheng Liu: Formal analysis, Writing - review & editing, Project administration. Guanghui Liu: Writing - review & editing, Supervision. Tiecheng Sun: Writing - review & editing. Jishun Guo: Writing - review & editing.

Acknowledgements

This research was supported in part by National Natural Science Foundation of China (NSFC) under Grants: 61872067 and 61720106004, in part by Department of Science and Technology of Sichuan Province under Grant 2019YFH0016.

References (64)

AbebeM.A. et al.
Towards an automatic correction of over-exposure in photographs: Application to tone-mapping
Comput. Vis. Image Underst.
(2018)
ShenJ. et al.
Gradient based image completion by solving the Poisson equation
Comput. Graph.
(2007)
WangQ. et al.
Performance evaluation of image fusion techniques
Image Fusion: Algorithms Appl.
(2008)
AgarwalaA. et al.
Interactive digital photomontage
ACM Trans. Graph.
(2004)
BanterleF. et al.
Advanced High Dynamic Range Imaging: Theory and Practice
(2011)
BoykovY. et al.
Fast approximate energy minimization via graph cuts
IEEE Trans. Pattern Anal. Mach. Intell.
(2001)
BurtP.J.
The pyramid as a structure for efficient computation
BurtP.J. et al.
The Laplacian pyramid as a compact image code
IEEE Trans. Commun.
(1983)
Burt, P.J., Kolczynski, R.J., 1993. Enhanced image capture through fusion. In: Proc. ICCV. pp....
CuiZ. et al.
Time slice video synthesis by robust video alignment
ACM Trans. Graph.
(2017)

DarmontA.

High Dynamic Range Imaging: Sensors and Architectures

(2012)

DebevecP.E. et al.

Recovering high dynamic range radiance maps from photographs

ACM Trans. Graph.

(1997)

Eden, A., Uyttendaele, M., Szeliski, R., 2006. Seamless image stitching of scenes with large motions and exposure...

FattalR. et al.

Gradient domain high dynamic range compression

ACM Trans. Graph.

(2002)

Gallo, O., Gelfandz, N., Chen, W.-C., Tico, M., Pulli, K., 2009. Artifact-free high dynamic range imaging. In: Proc....

GranadosM. et al.

Automatic noise modeling for ghost-free HDR reconstruction

ACM Trans. Graph.

(2013)

GrossbergM.D. et al.

Determining the camera response from images: What is knowable?

IEEE Trans. Pattern Anal. Mach. Intell.

(2003)

GuoH. et al.

Joint video stitching and stabilization from moving cameras

IEEE Trans. Image Process.

(2016)

HeK. et al.

Guided image filtering

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

HossnyM. et al.

Comments on ’Information measure for performance of image fusion’

Electron. Lett.

(2008)

Hu, J., Gallo, O., Pulli, K., Sun, X., 2013. HDR deghosting: How to deal with saturation? In: Proc. CVPR. pp....

JiaJ. et al.

Drag-and-drop pasting

ACM Trans. Graph.

(2006)

Jinno, T., Okuda, M., 2008. Motion blur free HDR image acquisition using multiple exposures. In: Proc. ICIP. pp....

KalantariN.K.

Deep high dynamic range imaging of dynamic scenes

ACM Trans. Graph.

(2017)

KalantariN. et al.

Patch-based high dynamic range video

ACM Trans. Graph.

(2013)

KangS.B. et al.

High dynamic range video

ACM Trans. Graph.

(2003)

LeeC. et al.

Ghost-free high dynamic range imaging via rank minimization

IEEE Signal Process. Lett.

(2014)

Levin, A., Zomet, A., Peleg, S., Weiss, Y., 2004. Seamless image stitching in the gradient domain. In: Proc. ECCV. pp....

LiS. et al.

Fast multi-exposure image fusion with median filter and recursive filter

IEEE Trans. Consum. Electron.

(2012)

LiS. et al.

Image fusion with guided filtering

IEEE Trans. Image Process.

(2013)

LiY. et al.

Lazy snapping

ACM Trans. Graph.

(2004)

Li, H., Zhang, L., 2018. Multi-exposure fusion with CNN features. In: Proc. ICIP....

Cited by (6)

Multi-exposure image fusion based on linear embeddings and watershed masking
2021, Signal Processing
Citation Excerpt :
A guided filter is then applied to these weights to eliminate the noise and discontinuities, followed by an adopted pyramid decomposition for fusion. After presenting their potential with several image processing and machine learning applications, convolutional neural networks (CNNs) are employed in many MEF studies, e.g., Li and Zhang [7], Que and Lee [14], Liu and Leung [15], Hu et al. [16], Li et al. [17]. For instance, several pretrained classification, super-resolution and denoising networks are compared in [7] in terms of their feature extraction capacity for MEF applications.
High dynamic range imaging (HDRI) is a challenging technology but yet demanding for modern imaging applications. Low-cost image sensors have limited dynamic range, and it is not always possible to capture and display natural scenes with high contrast and information loss in any exposure is inevitable. Three solutions for HDRI are using expensive high dynamic range (HDR) cameras with HDR-compatible displays, tone mapping operators for low dynamic range (LDR) screens, and capturing and fusing multiple exposures of the same LDR scene via image fusion algorithms. Companies that produce user grade devices prefer multi-exposure fusion (MEF) approaches to obtain HDR-like images for LDR screens due to its low cost. Hence, merging a stack of images containing different exposures of the same scene into a single informative image is an attractive research field. In this study, a novel, simple yet effective method is proposed for static image exposure fusion. The developed technique is based on weight map extraction via linear embeddings and watershed masking. The main advantage lies in watershed masking-based adjustment for obtaining accurate weights for image fusion. The comprehensive experimental comparisons demonstrate very strong visual and statistical results, and this approach should facilitate future MEF studies.
Entropy-driven exposure interpolation for large exposure-ratio imagery
2024, Multimedia Tools and Applications
An adaptive method to recover high dynamic range images from multi-camera systems in back-lighting scenario
2023, Research Square
UPHDR-GAN: Generative Adversarial Network for High Dynamic Range Imaging With Unpaired Data
2022, IEEE Transactions on Circuits and Systems for Video Technology
Image Style Transfer with Generative Adversarial Networks
2021, MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
UPHDR-GAN: Generative Adversarial Network for High Dynamic Range Imaging with Unpaired Data
2021, arXiv

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.cviu.2020.102929.

View full text

Multi-exposure photomontage with hand-held cameras☆

Highlights

Abstract

Introduction

Section snippets

Related works

Method

Experiments

Conclusion

CRediT authorship contribution statement

Acknowledgements

Comput. Vis. Image Underst.

Comput. Graph.

Image Fusion: Algorithms Appl.

Interactive digital photomontage

ACM Trans. Graph.

Advanced High Dynamic Range Imaging: Theory and Practice

Fast approximate energy minimization via graph cuts

IEEE Trans. Pattern Anal. Mach. Intell.

The pyramid as a structure for efficient computation

The Laplacian pyramid as a compact image code

IEEE Trans. Commun.

Time slice video synthesis by robust video alignment

ACM Trans. Graph.

High Dynamic Range Imaging: Sensors and Architectures

Recovering high dynamic range radiance maps from photographs

ACM Trans. Graph.

Gradient domain high dynamic range compression

ACM Trans. Graph.

Automatic noise modeling for ghost-free HDR reconstruction

ACM Trans. Graph.

Determining the camera response from images: What is knowable?

IEEE Trans. Pattern Anal. Mach. Intell.

Joint video stitching and stabilization from moving cameras

IEEE Trans. Image Process.

Guided image filtering

IEEE Trans. Pattern Anal. Mach. Intell.

Comments on ’Information measure for performance of image fusion’

Electron. Lett.

Drag-and-drop pasting

ACM Trans. Graph.

Deep high dynamic range imaging of dynamic scenes

ACM Trans. Graph.

Patch-based high dynamic range video

ACM Trans. Graph.

High dynamic range video

ACM Trans. Graph.

Ghost-free high dynamic range imaging via rank minimization

IEEE Signal Process. Lett.

Fast multi-exposure image fusion with median filter and recursive filter

IEEE Trans. Consum. Electron.

Image fusion with guided filtering

IEEE Trans. Image Process.

Lazy snapping

ACM Trans. Graph.