Keywords

1 Introduction

Facial image photo-realistic rendering is a novel computational photographic technique [12] to achieve facial effects that can be used for many applications, such as advertisement, movie production, digital entertainment, personalized photo editing and identity protection. Among the various current rendering techniques [14], this paper specifically focuses on the facial appearance transfer problem of the image-based portrait rendering.

Facial appearance transfer is the critical component of various facial editing tasks, including face replacement [4, 13], face swapping [1, 7, 18, 19], face reenactment [5, 16] and age progression [6]. It aims to transfer the facial appearance of the reference to the target with good visual consistency.

It is challenging to achieve seamless face transfer. Most previous methods are based on the facial mask, and the facial property matching (like lighting or color) between a target and a reference. Pérez et al. [13] proposed the Poisson seamless cloning by the guided interpolation in the gradient domain. Dale et al. [4] used a novel graph-cut method that estimates the optimal seam on the face mesh to obtain video face replacement. Bitouk et al. [1] used the shading model based on a linear combination of spherical harmonics to adjust facial color and lighting for face swapping. Recently, Garrido et al. [5] proposed the automatic face reenactment system that replaces the face of an actor with the face of a user using a color adjustment with the Poisson blending [13].

Fig. 1.
figure 1

The framework of the face transfer to blend the facial region of the reference to the target, which is based on facial appearance-map generated by adaptive label propagation.

The framework of the facial appearance transfer is shown in Fig. 1, which aims to transfer the facial region of the aligned reference to the target to produce the blended portrait. It consists of three stages: face alignment, facial appearance-map generation, and face composition.

Due to the complex appearance differences between the faces, however, a simple facial mask with Gaussian feathering may cause visual artifacts on the boundary of the transferred region; even the Poisson image editing [13] may fail to perform well when there is a large lighting or color difference between the target and reference, as shown in Fig. 4. To tackle the problem of illumination and region variances, this paper proposes a facial appearance map with good illumination-aware and region-aware properties for seamless facial appearance transfer. Inspired by Liang’s work on face enhancement [11], we formulate the facial appearance map generation as a label propagation process [20] from a semisupervised learning perspective [2]. Since the face blending problem is different from the face enhancement problem in [11], we propose an adaptive label propagation model with a new regularization structure and guided features to achieve seamless face transfer.

Based on the adaptive facial appearance map, we construct the facial appearance transfer framework containing three stages. Firstly, the facial region of the reference is aligned to the target according to the detected facial landmarks. Secondly, a facial quotient image [10, 15] and a binary mask are generated, and then the guided label propagation model is used to diffuse the initial features of the quotient image and the binary mask to obtain the adaptive facial appearance-map for illumination and mask adjustment, respectively. Finally, we use the appearance-maps to seamlessly transfer the reference to the target. Experimental results show the effectiveness and robustness of our methods compared with the previous methods for various image based facial rendering tasks, such as face replacement and face dubbing in [1, 4, 13].

The main contributions of the paper are summarized as follows: (1) An adaptive label propagation model with guided features to generate the illumination-aware and region-aware facial appearance map for seamless face transfer; (2) A facial appearance transfer framework based on the adaptive facial appearance map, which achieves various image-based face blending effects, such as face replacement and face dubbing.

2 Facial Appearance Transfer Framework

2.1 Face Alignment

In face alignment, we aims to match the reference \(I_{ref}\) and the target \(I_{tar}\) to obtain the transformed reference \(I'_{ref}\) and the wrapped target \(I'_{tar}\) for appearance-map generation and face composition.

Firstly, we use the Viola-Jones face detector [17] and the active shape model (ASM) [3] to locate the 86 landmarks in the facial components of the reference \(S_{ref}\) and the target \(S_{tar}\), respectively. Secondly, the transformed appearance \(I'_{ref}\) and shape \(S'_{ref}\) of the reference are obtained by matching the reference \(I_{ref}\) to the target \(I_{tar}\) using the affine transformation with the landmarks. Finally, we wrapped the target by the multilevel B-splines approximation (MBA) [9] according to the transformed shape of the reference \(S'_{ref}\), i.e. the appearance of the wrapped target \(I'_{tar}=f_{MBA}(I_{tar},S_{tar},S'_{ref})\). For more technical detail of MBA, we refer the readers to the article [9].

2.2 Facial Appearance-Map Generation

In face blending, directly pasting the face region of the reference to the target probably fail to perform well. According to our observation, apparent visual artifacts may be introduced to the results even through the gradient-domain Poisson cloning [13] is used when the reference and the target have large lighting or color variances. To tackle this problems, we construct two different types of facial appearance-maps (\(T_{quot}\) and \(T_{mask}\)) that perform adaptive illumination and region adjustments of the reference for seamless face transfer.

Inspired by Liang’s recent work [11] for face enhancement, we formulate the facial appearance-map generation as a label propagation process, which diffuses the features within the initialized facial map to obtain the whole map. Since the two appearance-maps require different diffusion processes, we integrate different regularization structures with different guided features to the label propagation model for the corresponding map diffusion.

Specifically, the appearance-map \(T_{quot}\) aims to relight the reference so that the illumination and the color of the reference appearance is consistent to the target, and it uses the quotient image [15] as the initialization. Unlike the original quotient image that only handles the region within the faces, the diffused the quotient appearance-map \(T_{quot}\) facilitates to relight the face with consistent background illumination.

The appearance-map \(T_{mask}\) is to adaptively select the facial region of the relighted reference for seamless face transfer with smooth region transition, which use the binary mask of the facial landmarks as the initial map.

The benefit of the diffusion-based map generation is twofold. Firstly, it is fault-tolerant to the small inaccurate landmark detection, since the final map value is determined by the label propagation process instead of the initialized value. Secondly, the map is adapted to the complex facial boundary and texture variance of the region by using different regularization structures and guided features. More detail of the structure and initialization of the label propagation model for \(T_{quot}\) and \(T_{mask}\) will be presented in Sect. 3.

2.3 Face Composition

To produce the output \(I_{out}\) of face transfer, we replace the facial region of the target \(I_{tar}\) with the facial region of \(I'_{ref}\) using the generated facial map \(T_{quot}\), \(T_{mask}\) as follows:

$$\begin{aligned} I_{out} = I'_{ref}\circ T_{quot} \circ T_{mask} + I_{tar}\circ (J-T_{mask}), \end{aligned}$$
(1)

where \(\circ \) denotes the element wise product operation, and J is the all-ones matrix with the same dimension of \(T_{mask}\). The results of face blending are shown in Fig. 1, where the corresponding generated masks \(T_{quot}\) and \(T_{mask}\) are shown in Fig. 2.

3 Facial Appearance-Map Generation

3.1 Adaptive Label Propagation Model

The appearance-map for face transfer is formulated as a label propagation model with an adaptive regularization structure and guided features, which generates the whole map by propagating the value of the initial map to the others according to the pixel similarity.

Specifically, the facial appearance-map T with n pixels is mapped into a graph \(\mathbf g = (\mathcal {V},\mathcal {E})\) of n nodes, where the node \(v_p\) corresponds to the \(p^{th}\) map location, and the edge \(e_{pq}\) links the node pair (pq) with the pixel similarity \(W_{pq}\). We initialize the node value by R (more details of R are in Sect.3.2), and obtain the appearance-map T by propagating the initial value of R through the graph according to the pixel-wise edge similarity given by the affinity matrix W.

The label propagation for appearance-map can be formulated as the minimization of the following quadratic cost functional:

$$\begin{aligned} Z(T)=\sum _{p}S_{pp}(T_p-R_p)^2+\frac{\lambda }{2} \sum _{p,q}W_{pq}(T_p-T_q)^2+\lambda \epsilon \sum _{p}T_p^2 \end{aligned}$$

The first term is the data term to constrain the diffusion region, where S is an \(n\times n\) diagonal matrix given by \(S_{pp}=1\) in the constraint region, otherwise \(S_{pp}=0\). The third term is a small added regularization term that prevents degeneration.

The second term is the smoothing term to determine the local smoothness property of the generated map T, where \(\lambda \) is used to balance the relative weights of the data term and the smoothness term; the weight matrix \(W_{pq}\) is non-zero iff \(v_p\) and \(v_q\) are “neighbors”, and its value measures the similarity between the nodes (pixels). In this paper, we use the typical value \(\lambda =1\) and \(\epsilon =0.0001\) for all the experiments.

The smoothness term has a closely relationship with graph Laplacian \(L_g\). Specifically, D is a diagonal matrix with \(D_{pp}=\sum _q W_{pq}\), and \(L_g=D-W\) is the un-normalized graph Laplacian. A more compact form of the cost function can be obtained as following:

$$\begin{aligned} Z(T) = \Vert S(T-R)\Vert ^2+\lambda T^\top (L_g+\epsilon I) T. \end{aligned}$$
(2)

The derivative of the cost is

$$\begin{aligned} \begin{aligned} \frac{1}{2}\frac{\partial Z(T)}{\partial T}&= S(T-R)+\lambda (L_g+\epsilon I) T\\&= (S+\lambda L_g + \lambda \epsilon I)T-SR, \end{aligned} \end{aligned}$$
(3)

T can be obtained when the derivative is set to 0:

$$\begin{aligned} T =(S+\lambda L_g + \lambda \epsilon I)^{-1}SR=L^{-1}SR, \end{aligned}$$
(4)

which is a linear equation about a symmetric, positive-definite Laplacian matrix L. It can be solved efficiently by the conjugate gradient descent with the multi-level preconditioning [8].

Also, Eq. 4 can be solved using a Jacobi iteration, which is similar to the iterative label propagation proposed by Zhu and Ghahramani [20] and Liang’s mask propagation model [11], except for the weight matrix that controls the diffusion property.

To obtain the appearance map of face transfer, we construct a new kernel structure with guided features for the weight matrix W of appearance-map diffusion.

3.2 Diffusions of Facial Appearance-Map

The edge-aware property of the optimization-based label propagation model is mostly controlled by the smoothness term, specifically the similarity metric of the weight matrix \(W_{pq}\). To produce the appearance-map for face transfer, we design a new kernel structure with guided features:

$$\begin{aligned} W_{pq}=\frac{c_{pq}G'_{pq}}{\Vert G_p-G_q\Vert ^\alpha +\varepsilon }, \end{aligned}$$
(5)

where G and \(G'\) are the guided features to control the local property of the map diffusion, c and \(\alpha \) are the parameters to adjust the sensitivity of the guided features, \(\varepsilon \) is a small constant to avoid division by zero (typically \(\varepsilon =0.0001\)). The appearance-map \(T_{quot}\) and \(T_{mask}\) can be generated by different initialization \(R_{\{quot,mask\}}\) and weight matrix \(W_{\{quot,mask\}}\) with the corresponding guided features and parameters.

Fig. 2.
figure 2

Diffusions of facial appearance-map \(T_{quot}\) and \(T_{mask}\) based on adaptive label propagation with different guided features.

The appearance-map \(T_{quot}\) aims to adjust the illumination of the reference according to the target based on the facial shading model of the quotient image [15], as shown in Fig. 2(c). To produce \(T_{quot}\), we set \(R_{quot}=\frac{f_{aWLS}(I'_{tar})}{f_{aWLS}(I'_{ref})}\), where \(R_{quot}\) is the quotient image of the matched target \(I'_{tar}\) and reference \(I'_{ref}\) using Liang’s adaptive weighted least squares filter \(f_{aWLS}\) [10] for edge-aware smoothing, as shown in Fig. 2(b). For the weight matrix \(W_{quot}\), we set \(\alpha =1\) and \(G=logL'_{ref}\), where \(L'_{ref}\) is the luminance of \(I'_{ref}\) (Fig. 2(a)); the value of \(cG'\) is small within the facial region and large in the background so that makes the features of the quotient image diffuse across the significant edges within the facial region to the whole image.

The appearance-map \(T_{mask}\) is responsible to paste the facial region of the reference to the target with smooth transition between different regions. For \(T_{mask}\), we set \(R_{quot}\) as a binary mask according to the facial landmarks, as shown in Fig. 2(d). To produce \(T_{mask}\) with adaptive region boundary, we set \(\alpha =1.2\), \(G=logL'_{ref}\) and \(c=0.5\) with \(G'=J\), where \(L'_{ref}\) is the luminance of \(I'_{ref}\) and J is the all-one matrix. The map diffusion is controlled by the gradient of the guided feature G, which assures the smooth transition of the blended region between \(I'_{ref}\) and \(I_{tar}\), as shown in Fig. 2(e).

4 Experiments

4.1 Basic Evaluation

The evaluations for facial appearance-map are shown in Figs. 2 and 3. The results show that the generated \(T_{quot}\) efficiently propagates the quotient value from the constrained regions of \(R_{quot}\) to the other regions, like eyes, eyebrows and background, and preserves the illumination consistence in the blended face. The appearance-map \(T_{mask}\) is generated with smooth transition, which is adapted to the region boundary between the face regions of the faces. The illumination-aware and region-aware diffusion of \(T_{quot}\) and \(T_{mask}\) ensure the robustness of the appearance transfer for faces with different properties, as shown in the experiments of face transfer.

Fig. 3.
figure 3

Facial appearance-map for quotient-based illumination diffusion (\(T_{quot}\)) and blending mask diffusion (\(T_{mask}\)) using the proposed label propagation with corresponding guided feature.

Fig. 4.
figure 4

Comparison with Poisson image cloning [13] for faces with large differences in age, color and lighting.

Fig. 5.
figure 5

Comparison with Bitouk et al. [1] for face replacement using reference target pair with different gender and roll rotation.

Fig. 6.
figure 6

Comparison with Dale et al. [4] for face dubbing, aims to transfer the series of the face appearance of the reference to the target. Comparison of the close-up images in the top rows illustrate that our method obtain better illumination consistency to the target than Dale’s [4]

The basic experimental evaluations of face blending with the appearance-map were performed for the face pairs with significant different appearance properties, such as lighting, color, age and gender, as shown in Figs. 4, 5 and 6. The test images were taken from the FEI face database or the internet. The good visual consistency of the results indicate the effectiveness and robustness of our method.

4.2 Comparison with Related Methods

We also made comparison with the related methods for face replacement [1, 13]. Figure 4 shows the comparison with the Poisson cloning [13]. Due to the dependency of the gradient and boundary of the blended region, the results of [13] are sensitive to the lighting and color differences of the faces. In contrast, our method obtains natural face blending effects. Comparison with Bitouk’s method in Fig. 5 further validates the effectiveness of our diffusion-based model.

We made the comparison between Dale’s [4] and our method for face dubbing, which aims to transfer the series of the face appearance of the reference to the target. The results indicate that both the methods can achieve good visual consistency in a global manner, as shown in Fig. 6. The close-up images of the local region in the first rows of Fig. 6, however, show the subtle differences. Dale’s method [4] tends to transfer the lighting property of the reference to the target, while ours tends to preserve the original appearance property of the target, which is complementary to [4].

5 Conclusion

This paper proposes a label propagation model with adaptive regularization to achieve facial blending with good visual consistency. Specifically, the illumination-aware and region-aware facial appearance maps are generated by diffusion with different guided features. Experiments illustrate the effectiveness and robustness of our methods for face replacement and face dubbing.