Full length articleDefocus to focus: Photo-realistic bokeh rendering by fusing defocus and radiance priors
Introduction
Bokeh, sometimes known as shallow depth of field (DoF), is an important aesthetic feature for photographers, which is popular in videography, portraiture, and landscape photography. Bokeh is closely related to focusing. In a camera shot, focusing refers to the process of adjusting the camera lens so that a scene at a certain distance from the camera is clearly imaged. By contrast, generating bokeh is conventionally recognized as blurring out-of-focus regions and producing delightful “bokeh balls”. The bokeh ball, which is also referred to as the Circle of Confusion (CoC), is one of the factors that affect the artistic quality of captured images. The CoC is shaped according to how a lens renders light from out-of-focus areas, and regularly, is viewed as a disc based on the shape of the aperture. In photography, the aperture of a camera is used to control the amount of light that passes through the lens and thus the exposure of the photosensitive sensor; a wide aperture allows more light to travel, which results in shallow DoF. However, not all shallow DoF is equivalent to bokeh. Shallow DoF often blurs the background, but only the blurred background with artistic patterns can be called bokeh.
Bokeh is mostly rendered from an expensive digital single-lens reflex (DSLR) camera by professional photographers, and the settings used to sharpen in-focus areas and to blur the rest require complicated maneuvers. The costs of operation time and expensive hardware hinder easy application of bokeh to amateurs. The desire to easily operate bokeh has therefore motivated enthusiasm for vision-based bokeh rendering.
Prior work has come up with several ideas to address bokeh rendering. One is to use stereo pairs to obtain disparity maps and to render bokeh; however, it requires extra dual-pixel sensors [1] or laser scanners [2]. Another alternative is deep learning-based neural rendering [3], [4], where methods are proposed to synthesize blur only from a single all-in-focus image. However, most existing models overlook two factors in photo-realistic bokeh rendering: (1) the rendering of CoC, an important artistic feature in bokeh, and (2) the sharpness of focused regions. Failing to render them can engender unrealistic effects, as shown in Fig. 1.
In this work, we are interested in single-image bokeh rendering. To render photo-realistic bokeh effects, we consider an effective single-image bokeh rendering approach should (i) construct correct relative depth relation and perceive in-focus regions, (ii) render the physically sound CoC, and (iii) maintain the sharpness of focused regions. Indeed rendering requires accurate depth and focusing to imply an appropriate blur amount of each pixel. We aim to integrate these two steps to simplify the pipeline. To address the rendering of CoC, we modify our layered rendering which fuses blurred images with different blur levels. The change is that we reassign weights in rendering, which follows the idea of scene radiance [5] and manually recovers high dynamic range (HDR) [6] to synthesize realistic shallow DoF where image intensity is transformed into radiance in HDR. In addition, to render the CoC naturally, we design a new disk-like blur kernel. Finally, to keep the sharpness of in-focus regions during upsampling, we propose to execute image fusion beyond naive interpolation. We utilize the predicted defocus map and Poisson gradient constraint as guidance for the fusion mask.
To this end, we present Defocus to Focus (D2F), a fusion framework for photo-realistic bokeh rendering (Fig. 2). In particular, we introduce defocus hallucination to implement relative depth prediction and focusing. We use the term ‘hallucination’ instead of ‘estimation’ because the predicted defocus map is ‘imagined’ without direct supervision. The imagined defocus map is equivalent to the absolute value of a signed depth map that unifies relative depth and focal distance (‘’ indicates the region-to-camera distance is further than focal distance, ‘’ suggests the distance is closer than focal distance, and ‘’ implies the distance is equal to focal distance). The defocus map is then used as a guidance for weighted layered rendering. Compared to layered rendering which fuses images blurred by different blur kernels, we harness scene radiance within pixels and manually set them in HDR to synthesize the CoC. Hence, a radiance virtualization module is designed to predict scene radiance, and we choose soft disk blur kernels to increase the reality of CoC. For efficiency, defocus hallucination and radiance virtualization are jointly learned to render bokeh in low resolution.
To recover the resolution of the rendered result, we propose a novel deep Poisson fusion designated for bokeh rendering. We predict the fusion mask by applying the defocus map and training a deep Poisson network. We maintain clear in-focus regions from blurred backgrounds and ensure smooth transitions between them.
Extensive experiments are conducted to validate D2F quantitatively and qualitatively. We evaluate D2F on the AIM 2020 Rendering Realistic Bokeh Challenge where a large-scale dataset called EBB! [3] is used. Results show that D2F achieves competitive results against well-established baselines and competition entries. Specifically, D2F increases PSNR by 0.17 dB than our previous work [7]. To evaluate the performance of the fusion method, we compare our fusion module with other image fusion methods. Our fusion method is proved to outperform other methods. We also design ablation studies to prove the effectiveness of each module. Furthermore, we show the superiority of the soft disk blur kernel over the naive one in generating realistic CoC through qualitative visualizations, where the soft blur kernel appears more like a disk. In addition, we compare different training strategies to examine convergence behaviors. Our experiments support that D2F can render photo-realistic bokeh.
Our main contributions include the following:
- •
D2F: a novel framework which integrates defocus and radiance priors into photo-realistic bokeh rendering. We also apply Poisson fusion to keep the sharpness of in-focus objects;
- •
Defocus Hallucination: a scheme that learns relative depth and focal distance without direct supervision. The defocus map defines the degree of blurring of each pixel, so we can manually change the focal plane as well as the blur amount.
- •
We employ defocus hallucination in deep Poisson fusion, where we predict the fusion mask from the predicted defocus map and Poisson gradient constraint.
The preliminary version of this work appeared in [7], which describes our runner-up solution in the AIM 2020 Rendering Realistic Bokeh Challenge. Here we extend [7] in the following aspects. First, we simplify the training scheme for fast convergence. Second, we address the remaining issue of our previous pipeline by employing image fusion with the help of a defocus map and a deep Poisson network. Our fusion module can keep the sharpness of in-focus objects. Third, we conduct additional experiments and analyses to justify the design choices and the soundness of our inclusions in the D2F framework.
Section snippets
Defocus estimation
Defocus estimation is closely related to depth estimation [8], [9] and defocus blur detection [10], [11]. While defocus detection focuses on whether the pixel is blurred, the defocus map from defocus estimation represents the amount of defocus blur in shallow DoF images, which has many applications such as image deblurring [12], blur magnification [13], and depth estimation [14], [15], [16], [17], [18]. Defocus estimation can be categorized into two types: region based and edge based.
Region
Defocus to focus framework
Bokeh rendering conventionally requires three components: (i) depth relations, (ii) the focal plane, and (iii) out-of-focus rendering. In this paper, we propose a Defocus to Focus (D2F) fusion framework to implement the three components. In particular, D2F simplifies (i) and (ii) into defocus hallucination, which integrates depth estimation and focal distance detection. The imagined defocus map can be a useful cue implying the blur amount of bokeh. For efficiency consideration, defocus
Results and discussions
In this section, we first introduce the experimental setting. Then we report the quantitative and qualitative performance of D2F on a large-scale bokeh dataset EBB!. Finally, we conduct an ablation study to analyze the influence of different factors on bokeh.
Conclusion
We have presented an effective fusion framework D2F to predict a bokeh image from a single narrow aperture image. By introducing defocus hallucination within our network using only a bokeh image as supervision, we train our defocus hallucination network to produce a single-channel defocus map, improving the aesthetic quality of the synthesized bokeh. D2F also improves the fusion of blurred images in layered rendering by radiance virtualization. Radiance virtualization transforms image intensity
CRediT authorship contribution statement
Xianrui Luo: Conceptualization, Methodology, Software, Writing – original draft. Juewen Peng: Conceptualization, Methodology, Software, Data curation. Ke Xian: Writing – review & editing, Visualization, Investigation. Zijin Wu: Visualization, Validation. Zhiguo Cao: Supervision, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. U1913602.
References (69)
- et al.
Synthetic depth-of-field with a single-camera mobile phone
ACM Trans. Graph.
(2018) - B. Busam, M. Hog, S. McDonagh, G. Slabaugh, SteReFo: Efficient Image Refocusing with Stereo Vision, in: Proceedings of...
- A. Ignatov, J. Patel, R. Timofte, Rendering Natural Camera Bokeh Effect with Deep Learning, in: Proceedings of the...
- et al.
DeepLens: shallow depth of field from a single image
ACM Trans. Graph.
(2018) - et al.
Virtual dslr: High quality dynamic depth-of-field synthesis on mobile platforms
Electron. Imaging
(2016) - et al.
Advanced High Dynamic Range Imaging: Theory and Practice
(2017) - et al.
Bokeh rendering from defocus estimation
Computer Vision – ECCV 2020 Workshops
(2020) - et al.
Monocular depth estimation with augmented ordinal depth relationships
IEEE Trans. Image Process.
(2018) - et al.
Estimating depth from monocular images as classification using deep fully convolutional residual networks
IEEE Trans. Circuits Syst. Video Technol.
(2017) - et al.
DeFusionNET: Defocus blur detection via recurrently fusing and refining discriminative multi-scale deep features
IEEE Trans. Pattern Anal. Mach. Intell.
(2020)
Defocus blur detection via boosting diversity of deep ensemble networks
IEEE Trans. Image Process.
Spatially variant defocus blur map estimation and deblurring from a single image
J. Vis. Commun. Image Represent.
Defocus magnification
Absolute depth estimation from a single defocused image
IEEE Trans. Image Process.
Break ames room illusion: depth from general single images
ACM Trans. Graph.
Depth map estimation using defocus and motion cues
IEEE Trans. Circuits Syst. Video Technol.
Interactive stereoscopic video conversion
IEEE Trans. Circuits Syst. Video Technol.
Blind image blur estimation via deep learning
IEEE Trans. Image Process.
A spectral and spatial approach of coarse-to-fine blurred image region detection
IEEE Signal Process. Lett.
Defocus map estimation from a single image
Pattern Recognit.
A closed-form solution to natural image matting
IEEE Trans. Pattern Anal. Mach. Intell.
Simultaneous estimation of defocus and motion blurs from single image using equivalent Gaussian representation
IEEE Trans. Circuits Syst. Video Technol.
Defocus map estimation from a single image via spectrum contrast
Opt. Lett.
Analyzing spatially-varying blur
Ranking-based salient object detection and depth prediction for shallow depth-of-field
Sensors
Synthetic defocus and look-ahead autofocus for casual videography
ACM Trans. Graph.
Cited by (8)
Let’s put a smile on that face—A positive facial expression improves aesthetics of portrait photographs
2023, Royal Society Open ScienceDepth-guided deep filtering network for efficient single image bokeh rendering
2023, Neural Computing and ApplicationsNeural Video Depth Stabilizer
2023, arXivNeural Video Depth Stabilizer
2023, Proceedings of the IEEE International Conference on Computer Vision
- 1
Xianrui Luo and Juewen Peng contributed equally to this work.