Special Section on CAD & Graphics 2019Illumination animating and editing in a single picture using scene structure estimation
Graphical abstract
Introduction
Illumination reconstruction and editing on a real image is a crucial problem in image processing and editing, which has become a hot research topic in computer vision and computer graphics. For example, in 3D object compositing, to produce realistic results, the illumination should be considered dedicatedly [1], [2], which is now widely applied in augmented reality. To remove and edit the shadow, the illumination on the shadow regions should be recovered [3], [4]. Also, illumination reconstruction and estimation are also crucial in image relighting [5] and decomposition [6]. In this paper, we exploit the problem of animating the illumination in a single still image using the scene structure estimation, and our system synthesizes realistic images with changing illumination conditions.
There are many difficulties in animating the illumination of a single image using scene structure estimation. First, the 3D geometry of the scene should be accurately estimated to produce realistic illumination rendering, while it is a difficult task to recover the 3D geometry from a still image. Second, the positions and colors of the lights in one image need to be estimated. The current lighting estimation methods, including those using light detection devices [7], usually require additional measuring equipment and many input images, and this limits their practicability in handling single image. Finally, effectively processing the complex specular shading and shadows in the image is necessitated to produce realistic illumination rendering results, which are common in real indoor images.
This paper presents a new illumination animating system using a four-step approach to address the above problems. We first train a Markov Random Field model which incorporates the user interaction on the existing RGB-D data sets, and perform inference on the MRF to create 3D geometry structure (depth map) for the image scene. Then, combining the estimated scene depth map, we develop a depth-aware intrinsic image decomposition method which effectively decomposes the input image into the specular shading, diffuse shading, and albedo. Next, we propose a rendering based optimization method to estimate the positions and colors of multiple lights in the scene from the estimated depth map and shading maps. Finally, with the estimated scene structure for the input image, i.e. the scene depth, scene albedo, and the light positions and colors, our system renders (relights) the realistic image with illumination configuration changing, such as light position and color.
Editing the illumination of the scene with area light sources is more challenging than that with point light sources. To process the area light sources, we need to estimate the light direction information, test the light visibility, and the illumination models are also much more complicated. To address the above problems, we propose an area light sampling method to edit the illumination of the scene with area light sources. The basic idea is to optimally sample the area light source using point light sources, and use the sampled point light sources to present area light sources. Using sampling method, we can not only model local area illumination, but also simulate complex lighting scene, and produce visually natural illumination editing results.
Our system uses only a single image as input without using any additional measuring equipment, which makes our method more practical. Efficient illumination estimation and editing of a single image have a wide range of applications. We also apply our method to 3D object compositing, image recoloring, and obtain state-of-the-art results. The main contributions of the paper are as follows:
- •
User-assisted depth estimation method from a still image with controllable depth estimation.
- •
The depth-aware intrinsic image decomposition method to estimate the specular shading, diffuse shading, and albedo from the image.
- •
The rendering-based optimization model for estimating the positions and colors of multiple lights in a scene.
- •
Sampling the area light source using point light sources to edit the illumination of the scene with area light sources.
Section snippets
Related work
Light estimation. Light estimation is a challenging problem in computer graphics, especially when processing indoor scenes. For indoor images, there are in general multiple light sources, and these lights have different sizes and shapes, directions, intensities, and spectral characteristics. The image-based lighting [7] illuminated the scenes and objects with images of light from the real world. This approach requires a hardware setup with additional light probes or cameras, to take
System overview
As illustrated in Fig. 2, for a single input image, our image illumination animating system consists of the following four steps:
Step 1: Depth estimation. We infer the 3-d location and 3-d orientation of the patch in the image from the Markov Random Field (MRF) trained on the RGBD data sets. The MRF model incorporates user interaction, which produces quantitatively accurate depth maps.
Step 2: Intrinsic image decomposition. With the estimated depth map, we develop a depth-aware intrinsic image
Depth estimation
The first step in our system is to estimate the geometry structure (depth map) of the scene. Saxena et al. [13] presented a supervised learning strategy to predict depth from a single still image. This learning-based method requires a database of RGBD (RGB+depth) images and estimates the depth of pixels in the image via the learned relationship between image features at multiple scales. However, this method usually suffers from poor results when the structure of the input image is significantly
Intrinsic image decomposition
According to the illumination effects of the input image, we decompose the input image into three images: specular shading, diffuse shading, and albedo. We first extract the specular shading from the image, and then we decompose the left image into diffuse shading and albedo.
Illumination estimation
In the previous section, we have processed the single input image in two ways. On the one hand, with the estimated depth map and light positions, we can decompose the single image into specular shading (S0) diffuse shading (D0) and albedo (R0). On the other hand, with the estimated depth map and light positions, we can render both the specular shading and the diffuse shading according to the Phong model. Based on above results, we now estimate the positions of light sources in the scene by
Illumination editing and animating
In the previous section, we have estimated the 3D structure and the color, positions of multiple lights in the scene of the input image. In this section, we detail how to render the realistic images by changing the light configurations (color and positions) via the Dichromatic Reflection Model (DRM), and process the shadows.
Animating and editing area light sources
To animate the illumination of a scene with area light sources is more challenging than that of a scene with point light sources, as area light sources usually have different shapes, and the corresponding illumination models are also much more complicated. Specifically, there are the following main challenges: Firstly, compared with a point light, it is more challenging to model the illumination emitted from the area light source, since area light source exhibits spatial shape. Secondly, the
Results and discussion
In this section, we have conducted a variety of experiments to indicate the effectiveness of our illumination editing and animating system. We evaluate our proposed method on both indoor and outdoor images. To validate the effectiveness of our system, we also evaluate our system on a synthesized image, which we have the accurate light position. We implement our algorithm using C++ on a desktop PC with Intel Core i5 3.2 GHz CPU and 8 GB memory. For an input image with a typical size of
Conclusion and future work
In this paper, we have presented an illumination animating and editing system for the single input image. Our system works well to produce dynamic shadows and specular spots with high quality, which are challenging for animating from a single image. Our system can process a scene with point light sources as well as area light sources. In the future, we will work on the illumination editing on the self-shadow regions. Video illumination editing and animating are two interesting topics in this
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank Chengjiang Long and Yingzhen Yang for their insightful comments and constructive suggestions. This work was partly supported by The National Key Research and Development Program of China (2017YFB1002600), the NSFC (No. 61672390), Wuhan Science and Technology Plan Project (No. 2017010201010109), and Key Technological Innovation Projects of Hubei Province (2018AAA062).
References (44)
- et al.
Speeded-up robust features (surf)
Comput. Vis. Image Underst.
(2008) - et al.
Rendering synthetic objects into legacy photographs
Proceedings of the ACM transactions on graphics (TOG)
(2011) - et al.
Automatic scene inference for 3d object compositing
ACM Trans. Graph.
(2014) - et al.
Shadow remover: Image shadow removal based on illumination recovering optimization
IEEE Trans. Image Process.
(2015) - et al.
Learning to remove soft shadows
ACM Trans. Graph.
(2015) - et al.
Acquiring the reflectance field of a human face
Proceedings of the SIGGRAPH
(2000) - et al.
Illumination decomposition for photograph with multiple light sources
IEEE Trans. Image Process.
(2017) Image-based lighting
IEEE Comput. Graph. Appl.
(2002)- et al.
Multiple light source estimation in a single image
Proceedings of the computer graphics forum
(2013) - et al.
Lighting estimation in indoor environments from low-quality images
Proceedings of the computer vision–ECCV 2012. workshops and demonstrations
(2012)
Pulling things out of perspective
Proceedings of the CVPR
Depth map prediction from a single image using a multi-scale deep network
Proceedings of the NIPS
Indoor scene structure analysis for single image depth estimation
Proceedings of the IEEE CVPR
Make3d: Learning 3d scene structure from a single still image
IEEE Trans. Pattern Anal. Mach. Intell.
Depth transfer: Depth extraction from video using non-parametric sampling
IEEE Trans. Pattern Anal. Mach. Intell.
Unsupervised monocular depth estimation with left-right consistency
Proceedings of the IEEE conference on computer vision and pattern recognition
Recovering intrinsic images from a single image
T-PAMI
A closed-form solution to retinex with nonlocal texture constraints
T-PAMI
Intrinsic images using optimization
Proceedings of the CVPR
Intrinsic images in the wild
ACM Trans. Graph.
User-assisted intrinsic images
Proceedings of the ACM Transactions on Graphics (TOG)
Interactive intrinsic video editing
ACM Trans. Graph.
Cited by (12)
Foreword to the special section on the international conference on computer-aided design and computer graphics (CAD/Graphics) 2019
2020, Computers and Graphics (Pergamon)A Note from the Editor in Chief
2019, Computers and Graphics (Pergamon)Survey of image composition based on deep learning
2023, Beijing Gongye Daxue Xuebao/Journal of Beijing University of TechnologyAutomatic Shadow Generation via Exposure Fusion
2023, IEEE Transactions on MultimediaShadow Generation for Composite Image with Multi-level Feature Fusion
2022, ACM International Conference Proceeding Series