We introduce a deep-learning based algorithm to infer high-fidelity facial reflectance from a single image. The algorithm uses convolutional neural networks to encode the input image into a latent representation, from which a decoder and a detail enhancing network reconstruct decoupled facial reflectance (albedo, specular, and normal) as well as the environmental lighting. These decoupled components, together with a 3D facial mesh estimated from the image, are then fed into a differentiable renderer to produce a rendered facial image. This allows us to iteratively optimize the latent representation of the facial image by minimizing the image-space reconstruction loss. Experimental results show that optimizing the latent representation through the differentiable renderer can effectively reduce the discrepancy between the original image and the rendered one, leading to a more accurate reconstruction of characteristic facial features such as skin tone, lip color, and facial hair.
