Elsevier

Information Fusion

Volume 89, January 2023, Pages 320-335
Information Fusion

Full length article
Defocus to focus: Photo-realistic bokeh rendering by fusing defocus and radiance priors

https://doi.org/10.1016/j.inffus.2022.08.023Get rights and content

Highlights

  • A novel fusion framework for photo-realistic bokeh rendering.

  • Defocus hallucination integrates depth by learning to focus with indirect supervision.

  • Radiance virtualization reassigns weights for the fusion of blurred images.

  • Deep Poisson fusion utilizes a predicted defocus map and Poisson gradient constraint.

Abstract

We consider the problem of realistic bokeh rendering from a single all-in-focus image. Bokeh rendering mimics aesthetic shallow depth-of-field (DoF) in professional photography, but these visual effects generated by existing methods suffer from simple flat background blur and blurred in-focus regions, giving rise to unrealistic rendered results. In this work, we argue that realistic bokeh rendering should (i) model depth relations and distinguish in-focus regions, (ii) sustain sharp in-focus regions, and (iii) render physically accurate Circle of Confusion (CoC). To this end, we present a Defocus to Focus (D2F) framework to learn realistic bokeh rendering by fusing defocus priors with the all-in-focus image and by implementing radiance priors in layered fusion. Since no depth map is provided, we introduce defocus hallucination to integrate depth by learning to focus. The predicted defocus map implies the blur amount of bokeh and is used to guide weighted layered rendering. In layered rendering, we fuse images blurred by different kernels based on the defocus map. To increase the reality of the bokeh, we adopt radiance virtualization to simulate scene radiance. The scene radiance used in weighted layered rendering reassigns weights in the soft disk kernel to produce the CoC. To ensure the sharpness of in-focus regions, we propose to fuse upsampled bokeh images and original images. We predict the initial fusion mask from our defocus map and refine the mask with a deep network. We evaluate our model on a large-scale bokeh dataset. Extensive experiments show that our approach is capable of rendering visually pleasing bokeh effects in complex scenes. In particular, our solution receives the runner-up award in the AIM 2020 Rendering Realistic Bokeh Challenge.

Introduction

Bokeh, sometimes known as shallow depth of field (DoF), is an important aesthetic feature for photographers, which is popular in videography, portraiture, and landscape photography. Bokeh is closely related to focusing. In a camera shot, focusing refers to the process of adjusting the camera lens so that a scene at a certain distance from the camera is clearly imaged. By contrast, generating bokeh is conventionally recognized as blurring out-of-focus regions and producing delightful “bokeh balls”. The bokeh ball, which is also referred to as the Circle of Confusion (CoC), is one of the factors that affect the artistic quality of captured images. The CoC is shaped according to how a lens renders light from out-of-focus areas, and regularly, is viewed as a disc based on the shape of the aperture. In photography, the aperture of a camera is used to control the amount of light that passes through the lens and thus the exposure of the photosensitive sensor; a wide aperture allows more light to travel, which results in shallow DoF. However, not all shallow DoF is equivalent to bokeh. Shallow DoF often blurs the background, but only the blurred background with artistic patterns can be called bokeh.

Bokeh is mostly rendered from an expensive digital single-lens reflex (DSLR) camera by professional photographers, and the settings used to sharpen in-focus areas and to blur the rest require complicated maneuvers. The costs of operation time and expensive hardware hinder easy application of bokeh to amateurs. The desire to easily operate bokeh has therefore motivated enthusiasm for vision-based bokeh rendering.

Prior work has come up with several ideas to address bokeh rendering. One is to use stereo pairs to obtain disparity maps and to render bokeh; however, it requires extra dual-pixel sensors [1] or laser scanners [2]. Another alternative is deep learning-based neural rendering [3], [4], where methods are proposed to synthesize blur only from a single all-in-focus image. However, most existing models overlook two factors in photo-realistic bokeh rendering: (1) the rendering of CoC, an important artistic feature in bokeh, and (2) the sharpness of focused regions. Failing to render them can engender unrealistic effects, as shown in Fig. 1.

In this work, we are interested in single-image bokeh rendering. To render photo-realistic bokeh effects, we consider an effective single-image bokeh rendering approach should (i) construct correct relative depth relation and perceive in-focus regions, (ii) render the physically sound CoC, and (iii) maintain the sharpness of focused regions. Indeed rendering requires accurate depth and focusing to imply an appropriate blur amount of each pixel. We aim to integrate these two steps to simplify the pipeline. To address the rendering of CoC, we modify our layered rendering which fuses blurred images with different blur levels. The change is that we reassign weights in rendering, which follows the idea of scene radiance [5] and manually recovers high dynamic range (HDR) [6] to synthesize realistic shallow DoF where image intensity is transformed into radiance in HDR. In addition, to render the CoC naturally, we design a new disk-like blur kernel. Finally, to keep the sharpness of in-focus regions during upsampling, we propose to execute image fusion beyond naive interpolation. We utilize the predicted defocus map and Poisson gradient constraint as guidance for the fusion mask.

To this end, we present Defocus to Focus (D2F), a fusion framework for photo-realistic bokeh rendering (Fig. 2). In particular, we introduce defocus hallucination to implement relative depth prediction and focusing. We use the term ‘hallucination’ instead of ‘estimation’ because the predicted defocus map is ‘imagined’ without direct supervision. The imagined defocus map is equivalent to the absolute value of a signed depth map that unifies relative depth and focal distance (‘>0’ indicates the region-to-camera distance is further than focal distance, ‘<0’ suggests the distance is closer than focal distance, and ‘=0’ implies the distance is equal to focal distance). The defocus map is then used as a guidance for weighted layered rendering. Compared to layered rendering which fuses images blurred by different blur kernels, we harness scene radiance within pixels and manually set them in HDR to synthesize the CoC. Hence, a radiance virtualization module is designed to predict scene radiance, and we choose soft disk blur kernels to increase the reality of CoC. For efficiency, defocus hallucination and radiance virtualization are jointly learned to render bokeh in low resolution.

To recover the resolution of the rendered result, we propose a novel deep Poisson fusion designated for bokeh rendering. We predict the fusion mask by applying the defocus map and training a deep Poisson network. We maintain clear in-focus regions from blurred backgrounds and ensure smooth transitions between them.

Extensive experiments are conducted to validate D2F quantitatively and qualitatively. We evaluate D2F on the AIM 2020 Rendering Realistic Bokeh Challenge where a large-scale dataset called EBB! [3] is used. Results show that D2F achieves competitive results against well-established baselines and competition entries. Specifically, D2F increases PSNR by 0.17 dB than our previous work [7]. To evaluate the performance of the fusion method, we compare our fusion module with other image fusion methods. Our fusion method is proved to outperform other methods. We also design ablation studies to prove the effectiveness of each module. Furthermore, we show the superiority of the soft disk blur kernel over the naive one in generating realistic CoC through qualitative visualizations, where the soft blur kernel appears more like a disk. In addition, we compare different training strategies to examine convergence behaviors. Our experiments support that D2F can render photo-realistic bokeh.

Our main contributions include the following:

  • D2F: a novel framework which integrates defocus and radiance priors into photo-realistic bokeh rendering. We also apply Poisson fusion to keep the sharpness of in-focus objects;

  • Defocus Hallucination: a scheme that learns relative depth and focal distance without direct supervision. The defocus map defines the degree of blurring of each pixel, so we can manually change the focal plane as well as the blur amount.

  • We employ defocus hallucination in deep Poisson fusion, where we predict the fusion mask from the predicted defocus map and Poisson gradient constraint.

The preliminary version of this work appeared in [7], which describes our runner-up solution in the AIM 2020 Rendering Realistic Bokeh Challenge. Here we extend [7] in the following aspects. First, we simplify the training scheme for fast convergence. Second, we address the remaining issue of our previous pipeline by employing image fusion with the help of a defocus map and a deep Poisson network. Our fusion module can keep the sharpness of in-focus objects. Third, we conduct additional experiments and analyses to justify the design choices and the soundness of our inclusions in the D2F framework.

Section snippets

Defocus estimation

Defocus estimation is closely related to depth estimation [8], [9] and defocus blur detection [10], [11]. While defocus detection focuses on whether the pixel is blurred, the defocus map from defocus estimation represents the amount of defocus blur in shallow DoF images, which has many applications such as image deblurring [12], blur magnification [13], and depth estimation [14], [15], [16], [17], [18]. Defocus estimation can be categorized into two types: region based and edge based.

Region

Defocus to focus framework

Bokeh rendering conventionally requires three components: (i) depth relations, (ii) the focal plane, and (iii) out-of-focus rendering. In this paper, we propose a Defocus to Focus (D2F) fusion framework to implement the three components. In particular, D2F simplifies (i) and (ii) into defocus hallucination, which integrates depth estimation and focal distance detection. The imagined defocus map can be a useful cue implying the blur amount of bokeh. For efficiency consideration, defocus

Results and discussions

In this section, we first introduce the experimental setting. Then we report the quantitative and qualitative performance of D2F on a large-scale bokeh dataset EBB!. Finally, we conduct an ablation study to analyze the influence of different factors on bokeh.

Conclusion

We have presented an effective fusion framework D2F to predict a bokeh image from a single narrow aperture image. By introducing defocus hallucination within our network using only a bokeh image as supervision, we train our defocus hallucination network to produce a single-channel defocus map, improving the aesthetic quality of the synthesized bokeh. D2F also improves the fusion of blurred images in layered rendering by radiance virtualization. Radiance virtualization transforms image intensity

CRediT authorship contribution statement

Xianrui Luo: Conceptualization, Methodology, Software, Writing – original draft. Juewen Peng: Conceptualization, Methodology, Software, Data curation. Ke Xian: Writing – review & editing, Visualization, Investigation. Zijin Wu: Visualization, Validation. Zhiguo Cao: Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. U1913602.

References (69)

  • WadhwaN. et al.

    Synthetic depth-of-field with a single-camera mobile phone

    ACM Trans. Graph.

    (2018)
  • B. Busam, M. Hog, S. McDonagh, G. Slabaugh, SteReFo: Efficient Image Refocusing with Stereo Vision, in: Proceedings of...
  • A. Ignatov, J. Patel, R. Timofte, Rendering Natural Camera Bokeh Effect with Deep Learning, in: Proceedings of the...
  • WangL. et al.

    DeepLens: shallow depth of field from a single image

    ACM Trans. Graph.

    (2018)
  • YangY. et al.

    Virtual dslr: High quality dynamic depth-of-field synthesis on mobile platforms

    Electron. Imaging

    (2016)
  • BanterleF. et al.

    Advanced High Dynamic Range Imaging: Theory and Practice

    (2017)
  • LuoX. et al.

    Bokeh rendering from defocus estimation

    Computer Vision – ECCV 2020 Workshops

    (2020)
  • CaoY. et al.

    Monocular depth estimation with augmented ordinal depth relationships

    IEEE Trans. Image Process.

    (2018)
  • CaoY. et al.

    Estimating depth from monocular images as classification using deep fully convolutional residual networks

    IEEE Trans. Circuits Syst. Video Technol.

    (2017)
  • TangC. et al.

    DeFusionNET: Defocus blur detection via recurrently fusing and refining discriminative multi-scale deep features

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • ZhaoW. et al.

    Defocus blur detection via boosting diversity of deep ensemble networks

    IEEE Trans. Image Process.

    (2021)
  • ZhangX. et al.

    Spatially variant defocus blur map estimation and deblurring from a single image

    J. Vis. Commun. Image Represent.

    (2016)
  • BaeS. et al.

    Defocus magnification

  • LinJ. et al.

    Absolute depth estimation from a single defocused image

    IEEE Trans. Image Process.

    (2013)
  • K. Xian, J. Zhang, O. Wang, L. Mai, Z. Lin, Z. Cao, Structure-Guided Ranking Loss for Single Image Depth Prediction,...
  • ShiJ. et al.

    Break ames room illusion: depth from general single images

    ACM Trans. Graph.

    (2015)
  • KumarH. et al.

    Depth map estimation using defocus and motion cues

    IEEE Trans. Circuits Syst. Video Technol.

    (2018)
  • ZhangZ. et al.

    Interactive stereoscopic video conversion

    IEEE Trans. Circuits Syst. Video Technol.

    (2013)
  • J. Shi, L. Xu, J. Jia, Just noticeable defocus blur detection and estimation, in: Proceedings of the IEEE Conference on...
  • YanR. et al.

    Blind image blur estimation via deep learning

    IEEE Trans. Image Process.

    (2016)
  • TangC. et al.

    A spectral and spatial approach of coarse-to-fine blurred image region detection

    IEEE Signal Process. Lett.

    (2016)
  • ZhuoS. et al.

    Defocus map estimation from a single image

    Pattern Recognit.

    (2011)
  • LevinA. et al.

    A closed-form solution to natural image matting

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • G. Xu, Y. Quan, H. Ji, Estimating defocus blur via rank of local patches, in: Proceedings of the IEEE International...
  • KumarH. et al.

    Simultaneous estimation of defocus and motion blurs from single image using equivalent Gaussian representation

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • J. Park, Y.-W. Tai, D. Cho, I. So Kweon, A unified approach of multi-scale deep and hand-crafted features for defocus...
  • TangC. et al.

    Defocus map estimation from a single image via spectrum contrast

    Opt. Lett.

    (2013)
  • ChakrabartiA. et al.

    Analyzing spatially-varying blur

  • R. Fontaine, A survey of enabling technologies in successful consumer digital imaging products, in: Proceedings of the...
  • C. Herrmann, R.S. Bowen, N. Wadhwa, R. Garg, Q. He, J.T. Barron, R. Zabih, Learning to Autofocus, in: Proceedings of...
  • XianK. et al.

    Ranking-based salient object detection and depth prediction for shallow depth-of-field

    Sensors

    (2021)
  • ZhangX. et al.

    Synthetic defocus and look-ahead autofocus for casual videography

    ACM Trans. Graph.

    (2019)
  • P.P. Srinivasan, R. Garg, N. Wadhwa, R. Ng, J.T. Barron, Aperture supervision for monocular depth estimation, in:...
  • H. Tang, S. Cohen, B. Price, S. Schiller, K.N. Kutulakos, Depth from defocus in the wild, in: Proceedings of the IEEE...
  • Cited by (8)

    View all citing articles on Scopus
    1

    Xianrui Luo and Juewen Peng contributed equally to this work.

    View full text