Elsevier

Computers & Graphics

Volume 82, August 2019, Pages 53-64
Computers & Graphics

Special Section on CAD & Graphics 2019
Illumination animating and editing in a single picture using scene structure estimation

https://doi.org/10.1016/j.cag.2019.05.007Get rights and content

Highlights

  • Propose a user-assisted depth estimation method from a still image with controllable depth estimation.

  • Introduce a depth-aware intrinsic image decomposition method to estimate the specular shading, diffuse shading and albedo from the image.

  • Present a rendering-based optimization model for estimating the positions and colors of multiple lights in a scene.

  • Present a rendering-based optimization model for estimating the positions and colors of multiple lights in a scene.

Abstract

Editing the Illumination information in an image is a fundamental problem in image processing. In this paper, by estimating the scene structure, we propose a novel method to animate the illumination of a single input image. We first estimate the depth map of the input image by incorporating user interaction into the probabilistic inference model. Then we present a depth-aware intrinsic images decomposition method, which decomposes the image into specular shading, diffuse shading, and albedo. Combining the depth maps and shading maps, we develop a rendering-based optimization model to effectively estimate the positions and colors of multiple lights in a scene. With the estimated scene structure, including the depth map, reflectance, shading, as well as light positions and colors, we can animate and edit the illumination of the image via re-rendering the original image by changing the light configuration. Our method can effectively process images with complex shadow, multiple lights (including both point and area light sources) and specular spots. We show a variety of examples, including both indoor and outdoor environment to validate the effectiveness of the proposed method.

Introduction

Illumination reconstruction and editing on a real image is a crucial problem in image processing and editing, which has become a hot research topic in computer vision and computer graphics. For example, in 3D object compositing, to produce realistic results, the illumination should be considered dedicatedly [1], [2], which is now widely applied in augmented reality. To remove and edit the shadow, the illumination on the shadow regions should be recovered [3], [4]. Also, illumination reconstruction and estimation are also crucial in image relighting [5] and decomposition [6]. In this paper, we exploit the problem of animating the illumination in a single still image using the scene structure estimation, and our system synthesizes realistic images with changing illumination conditions.

There are many difficulties in animating the illumination of a single image using scene structure estimation. First, the 3D geometry of the scene should be accurately estimated to produce realistic illumination rendering, while it is a difficult task to recover the 3D geometry from a still image. Second, the positions and colors of the lights in one image need to be estimated. The current lighting estimation methods, including those using light detection devices [7], usually require additional measuring equipment and many input images, and this limits their practicability in handling single image. Finally, effectively processing the complex specular shading and shadows in the image is necessitated to produce realistic illumination rendering results, which are common in real indoor images.

This paper presents a new illumination animating system using a four-step approach to address the above problems. We first train a Markov Random Field model which incorporates the user interaction on the existing RGB-D data sets, and perform inference on the MRF to create 3D geometry structure (depth map) for the image scene. Then, combining the estimated scene depth map, we develop a depth-aware intrinsic image decomposition method which effectively decomposes the input image into the specular shading, diffuse shading, and albedo. Next, we propose a rendering based optimization method to estimate the positions and colors of multiple lights in the scene from the estimated depth map and shading maps. Finally, with the estimated scene structure for the input image, i.e. the scene depth, scene albedo, and the light positions and colors, our system renders (relights) the realistic image with illumination configuration changing, such as light position and color.

Editing the illumination of the scene with area light sources is more challenging than that with point light sources. To process the area light sources, we need to estimate the light direction information, test the light visibility, and the illumination models are also much more complicated. To address the above problems, we propose an area light sampling method to edit the illumination of the scene with area light sources. The basic idea is to optimally sample the area light source using point light sources, and use the sampled point light sources to present area light sources. Using sampling method, we can not only model local area illumination, but also simulate complex lighting scene, and produce visually natural illumination editing results.

Our system uses only a single image as input without using any additional measuring equipment, which makes our method more practical. Efficient illumination estimation and editing of a single image have a wide range of applications. We also apply our method to 3D object compositing, image recoloring, and obtain state-of-the-art results. The main contributions of the paper are as follows:

  • User-assisted depth estimation method from a still image with controllable depth estimation.

  • The depth-aware intrinsic image decomposition method to estimate the specular shading, diffuse shading, and albedo from the image.

  • The rendering-based optimization model for estimating the positions and colors of multiple lights in a scene.

  • Sampling the area light source using point light sources to edit the illumination of the scene with area light sources.

Section snippets

Related work

Light estimation. Light estimation is a challenging problem in computer graphics, especially when processing indoor scenes. For indoor images, there are in general multiple light sources, and these lights have different sizes and shapes, directions, intensities, and spectral characteristics. The image-based lighting [7] illuminated the scenes and objects with images of light from the real world. This approach requires a hardware setup with additional light probes or cameras, to take

System overview

As illustrated in Fig. 2, for a single input image, our image illumination animating system consists of the following four steps:

Step 1: Depth estimation. We infer the 3-d location and 3-d orientation of the patch in the image from the Markov Random Field (MRF) trained on the RGBD data sets. The MRF model incorporates user interaction, which produces quantitatively accurate depth maps.

Step 2: Intrinsic image decomposition. With the estimated depth map, we develop a depth-aware intrinsic image

Depth estimation

The first step in our system is to estimate the geometry structure (depth map) of the scene. Saxena et al. [13] presented a supervised learning strategy to predict depth from a single still image. This learning-based method requires a database of RGBD (RGB+depth) images and estimates the depth of pixels in the image via the learned relationship between image features at multiple scales. However, this method usually suffers from poor results when the structure of the input image is significantly

Intrinsic image decomposition

According to the illumination effects of the input image, we decompose the input image into three images: specular shading, diffuse shading, and albedo. We first extract the specular shading from the image, and then we decompose the left image into diffuse shading and albedo.

Illumination estimation

In the previous section, we have processed the single input image in two ways. On the one hand, with the estimated depth map and light positions, we can decompose the single image into specular shading (S0) diffuse shading (D0) and albedo (R0). On the other hand, with the estimated depth map and light positions, we can render both the specular shading and the diffuse shading according to the Phong model. Based on above results, we now estimate the positions of light sources in the scene by

Illumination editing and animating

In the previous section, we have estimated the 3D structure and the color, positions of multiple lights in the scene of the input image. In this section, we detail how to render the realistic images by changing the light configurations (color and positions) via the Dichromatic Reflection Model (DRM), and process the shadows.

Animating and editing area light sources

To animate the illumination of a scene with area light sources is more challenging than that of a scene with point light sources, as area light sources usually have different shapes, and the corresponding illumination models are also much more complicated. Specifically, there are the following main challenges: Firstly, compared with a point light, it is more challenging to model the illumination emitted from the area light source, since area light source exhibits spatial shape. Secondly, the

Results and discussion

In this section, we have conducted a variety of experiments to indicate the effectiveness of our illumination editing and animating system. We evaluate our proposed method on both indoor and outdoor images. To validate the effectiveness of our system, we also evaluate our system on a synthesized image, which we have the accurate light position. We implement our algorithm using C++ on a desktop PC with Intel Core i5 3.2 GHz CPU and 8 GB memory. For an input image with a typical size of

Conclusion and future work

In this paper, we have presented an illumination animating and editing system for the single input image. Our system works well to produce dynamic shadows and specular spots with high quality, which are challenging for animating from a single image. Our system can process a scene with point light sources as well as area light sources. In the future, we will work on the illumination editing on the self-shadow regions. Video illumination editing and animating are two interesting topics in this

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank Chengjiang Long and Yingzhen Yang for their insightful comments and constructive suggestions. This work was partly supported by The National Key Research and Development Program of China (2017YFB1002600), the NSFC (No. 61672390), Wuhan Science and Technology Plan Project (No. 2017010201010109), and Key Technological Innovation Projects of Hubei Province (2018AAA062).

References (44)

  • H. Bay et al.

    Speeded-up robust features (surf)

    Comput. Vis. Image Underst.

    (2008)
  • K. Karsch et al.

    Rendering synthetic objects into legacy photographs

    Proceedings of the ACM transactions on graphics (TOG)

    (2011)
  • K. Karsch et al.

    Automatic scene inference for 3d object compositing

    ACM Trans. Graph.

    (2014)
  • L. Zhang et al.

    Shadow remover: Image shadow removal based on illumination recovering optimization

    IEEE Trans. Image Process.

    (2015)
  • M. Gryka et al.

    Learning to remove soft shadows

    ACM Trans. Graph.

    (2015)
  • P. Debevec et al.

    Acquiring the reflectance field of a human face

    Proceedings of the SIGGRAPH

    (2000)
  • L. Zhang et al.

    Illumination decomposition for photograph with multiple light sources

    IEEE Trans. Image Process.

    (2017)
  • P. Debevec

    Image-based lighting

    IEEE Comput. Graph. Appl.

    (2002)
  • J. Lopez-Moreno et al.

    Multiple light source estimation in a single image

    Proceedings of the computer graphics forum

    (2013)
  • N. Neverova et al.

    Lighting estimation in indoor environments from low-quality images

    Proceedings of the computer vision–ECCV 2012. workshops and demonstrations

    (2012)
  • L. Ladicky et al.

    Pulling things out of perspective

    Proceedings of the CVPR

    (2014)
  • D. Eigen et al.

    Depth map prediction from a single image using a multi-scale deep network

    Proceedings of the NIPS

    (2014)
  • W. Zhuo et al.

    Indoor scene structure analysis for single image depth estimation

    Proceedings of the IEEE CVPR

    (2015)
  • A. Saxena et al.

    Make3d: Learning 3d scene structure from a single still image

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • K. Karsch et al.

    Depth transfer: Depth extraction from video using non-parametric sampling

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • C. Godard et al.

    Unsupervised monocular depth estimation with left-right consistency

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2017)
  • M.F. Tappen et al.

    Recovering intrinsic images from a single image

    T-PAMI

    (2005)
  • Q. Zhao et al.

    A closed-form solution to retinex with nonlocal texture constraints

    T-PAMI

    (2012)
  • J. Shen et al.

    Intrinsic images using optimization

    Proceedings of the CVPR

    (2011)
  • S. Bell et al.

    Intrinsic images in the wild

    ACM Trans. Graph.

    (2014)
  • A. Bousseau et al.

    User-assisted intrinsic images

    Proceedings of the ACM Transactions on Graphics (TOG)

    (2009)
  • N. Bonneel et al.

    Interactive intrinsic video editing

    ACM Trans. Graph.

    (2014)
  • View full text