Abstract
Facial blending is critical for various facial editing applications, whose goal is to transfer the facial appearance of the reference to the target in seamless manners. However, when there are significant illumination or color differences between the reference and the target, visual artifacts may be probably introduced into the result. To tackle this problem, we propose content-aware masks that adaptively adjust the facial lighting and blended region to achieve seamless face blending. To generate the content-aware masks with good visual consistency, we formulate it as a label propagation process from a semi-supervised learning perspective, where the intensity of the initialized masks are propagated to the whole masks based on the local visual similarity of the images. Then, we construct a content-aware face blending framework that consists of three stages. Firstly, the facial region of the reference and the target are aligned according to the detected facial landmarks. Secondly, a facial quotient image and a binary mask are obtained as the initialized masks, and the content-aware masks for illumination and region adjustment are generated using the label propagation model with different guided feature. Finally, we combine the reference to the target using the generated masks to produce the face blending effects. Experimental results show the effectiveness and robustness of our methods for different image-based facial rendering tasks.
This research was supported in part by the National Natural Science Foundation of China under Grant No. 61502176, 61872151, the Natural Science Foundation of Guangdong Province under Grant No. 2016A030313480, the Pearl River S&T Nova Program of Guangzhou under Grant No. 201806010088 and Fundamental Research Funds for the Central Universities (No. 2017BQ058).
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Facial image photo-realistic rendering is a novel computational photographic technique [12] to achieve facial effects that can be used for many applications, such as advertisement, movie production, digital entertainment, personalized photo editing and identity protection. Among the various current rendering techniques [14], this paper specifically focuses on the facial appearance transfer problem of the image-based portrait rendering.
Facial appearance transfer is the critical component of various facial editing tasks, including face replacement [4, 13], face swapping [1, 7, 18, 19], face reenactment [5, 16] and age progression [6]. It aims to transfer the facial appearance of the reference to the target with good visual consistency.
It is challenging to achieve seamless face transfer. Most previous methods are based on the facial mask, and the facial property matching (like lighting or color) between a target and a reference. Pérez et al. [13] proposed the Poisson seamless cloning by the guided interpolation in the gradient domain. Dale et al. [4] used a novel graph-cut method that estimates the optimal seam on the face mesh to obtain video face replacement. Bitouk et al. [1] used the shading model based on a linear combination of spherical harmonics to adjust facial color and lighting for face swapping. Recently, Garrido et al. [5] proposed the automatic face reenactment system that replaces the face of an actor with the face of a user using a color adjustment with the Poisson blending [13].
The framework of the facial appearance transfer is shown in Fig. 1, which aims to transfer the facial region of the aligned reference to the target to produce the blended portrait. It consists of three stages: face alignment, facial appearance-map generation, and face composition.
Due to the complex appearance differences between the faces, however, a simple facial mask with Gaussian feathering may cause visual artifacts on the boundary of the transferred region; even the Poisson image editing [13] may fail to perform well when there is a large lighting or color difference between the target and reference, as shown in Fig. 4. To tackle the problem of illumination and region variances, this paper proposes a facial appearance map with good illumination-aware and region-aware properties for seamless facial appearance transfer. Inspired by Liang’s work on face enhancement [11], we formulate the facial appearance map generation as a label propagation process [20] from a semisupervised learning perspective [2]. Since the face blending problem is different from the face enhancement problem in [11], we propose an adaptive label propagation model with a new regularization structure and guided features to achieve seamless face transfer.
Based on the adaptive facial appearance map, we construct the facial appearance transfer framework containing three stages. Firstly, the facial region of the reference is aligned to the target according to the detected facial landmarks. Secondly, a facial quotient image [10, 15] and a binary mask are generated, and then the guided label propagation model is used to diffuse the initial features of the quotient image and the binary mask to obtain the adaptive facial appearance-map for illumination and mask adjustment, respectively. Finally, we use the appearance-maps to seamlessly transfer the reference to the target. Experimental results show the effectiveness and robustness of our methods compared with the previous methods for various image based facial rendering tasks, such as face replacement and face dubbing in [1, 4, 13].
The main contributions of the paper are summarized as follows: (1) An adaptive label propagation model with guided features to generate the illumination-aware and region-aware facial appearance map for seamless face transfer; (2) A facial appearance transfer framework based on the adaptive facial appearance map, which achieves various image-based face blending effects, such as face replacement and face dubbing.
2 Facial Appearance Transfer Framework
2.1 Face Alignment
In face alignment, we aims to match the reference \(I_{ref}\) and the target \(I_{tar}\) to obtain the transformed reference \(I'_{ref}\) and the wrapped target \(I'_{tar}\) for appearance-map generation and face composition.
Firstly, we use the Viola-Jones face detector [17] and the active shape model (ASM) [3] to locate the 86 landmarks in the facial components of the reference \(S_{ref}\) and the target \(S_{tar}\), respectively. Secondly, the transformed appearance \(I'_{ref}\) and shape \(S'_{ref}\) of the reference are obtained by matching the reference \(I_{ref}\) to the target \(I_{tar}\) using the affine transformation with the landmarks. Finally, we wrapped the target by the multilevel B-splines approximation (MBA) [9] according to the transformed shape of the reference \(S'_{ref}\), i.e. the appearance of the wrapped target \(I'_{tar}=f_{MBA}(I_{tar},S_{tar},S'_{ref})\). For more technical detail of MBA, we refer the readers to the article [9].
2.2 Facial Appearance-Map Generation
In face blending, directly pasting the face region of the reference to the target probably fail to perform well. According to our observation, apparent visual artifacts may be introduced to the results even through the gradient-domain Poisson cloning [13] is used when the reference and the target have large lighting or color variances. To tackle this problems, we construct two different types of facial appearance-maps (\(T_{quot}\) and \(T_{mask}\)) that perform adaptive illumination and region adjustments of the reference for seamless face transfer.
Inspired by Liang’s recent work [11] for face enhancement, we formulate the facial appearance-map generation as a label propagation process, which diffuses the features within the initialized facial map to obtain the whole map. Since the two appearance-maps require different diffusion processes, we integrate different regularization structures with different guided features to the label propagation model for the corresponding map diffusion.
Specifically, the appearance-map \(T_{quot}\) aims to relight the reference so that the illumination and the color of the reference appearance is consistent to the target, and it uses the quotient image [15] as the initialization. Unlike the original quotient image that only handles the region within the faces, the diffused the quotient appearance-map \(T_{quot}\) facilitates to relight the face with consistent background illumination.
The appearance-map \(T_{mask}\) is to adaptively select the facial region of the relighted reference for seamless face transfer with smooth region transition, which use the binary mask of the facial landmarks as the initial map.
The benefit of the diffusion-based map generation is twofold. Firstly, it is fault-tolerant to the small inaccurate landmark detection, since the final map value is determined by the label propagation process instead of the initialized value. Secondly, the map is adapted to the complex facial boundary and texture variance of the region by using different regularization structures and guided features. More detail of the structure and initialization of the label propagation model for \(T_{quot}\) and \(T_{mask}\) will be presented in Sect. 3.
2.3 Face Composition
To produce the output \(I_{out}\) of face transfer, we replace the facial region of the target \(I_{tar}\) with the facial region of \(I'_{ref}\) using the generated facial map \(T_{quot}\), \(T_{mask}\) as follows:
where \(\circ \) denotes the element wise product operation, and J is the all-ones matrix with the same dimension of \(T_{mask}\). The results of face blending are shown in Fig. 1, where the corresponding generated masks \(T_{quot}\) and \(T_{mask}\) are shown in Fig. 2.
3 Facial Appearance-Map Generation
3.1 Adaptive Label Propagation Model
The appearance-map for face transfer is formulated as a label propagation model with an adaptive regularization structure and guided features, which generates the whole map by propagating the value of the initial map to the others according to the pixel similarity.
Specifically, the facial appearance-map T with n pixels is mapped into a graph \(\mathbf g = (\mathcal {V},\mathcal {E})\) of n nodes, where the node \(v_p\) corresponds to the \(p^{th}\) map location, and the edge \(e_{pq}\) links the node pair (p, q) with the pixel similarity \(W_{pq}\). We initialize the node value by R (more details of R are in Sect.3.2), and obtain the appearance-map T by propagating the initial value of R through the graph according to the pixel-wise edge similarity given by the affinity matrix W.
The label propagation for appearance-map can be formulated as the minimization of the following quadratic cost functional:
The first term is the data term to constrain the diffusion region, where S is an \(n\times n\) diagonal matrix given by \(S_{pp}=1\) in the constraint region, otherwise \(S_{pp}=0\). The third term is a small added regularization term that prevents degeneration.
The second term is the smoothing term to determine the local smoothness property of the generated map T, where \(\lambda \) is used to balance the relative weights of the data term and the smoothness term; the weight matrix \(W_{pq}\) is non-zero iff \(v_p\) and \(v_q\) are “neighbors”, and its value measures the similarity between the nodes (pixels). In this paper, we use the typical value \(\lambda =1\) and \(\epsilon =0.0001\) for all the experiments.
The smoothness term has a closely relationship with graph Laplacian \(L_g\). Specifically, D is a diagonal matrix with \(D_{pp}=\sum _q W_{pq}\), and \(L_g=D-W\) is the un-normalized graph Laplacian. A more compact form of the cost function can be obtained as following:
The derivative of the cost is
T can be obtained when the derivative is set to 0:
which is a linear equation about a symmetric, positive-definite Laplacian matrix L. It can be solved efficiently by the conjugate gradient descent with the multi-level preconditioning [8].
Also, Eq. 4 can be solved using a Jacobi iteration, which is similar to the iterative label propagation proposed by Zhu and Ghahramani [20] and Liang’s mask propagation model [11], except for the weight matrix that controls the diffusion property.
To obtain the appearance map of face transfer, we construct a new kernel structure with guided features for the weight matrix W of appearance-map diffusion.
3.2 Diffusions of Facial Appearance-Map
The edge-aware property of the optimization-based label propagation model is mostly controlled by the smoothness term, specifically the similarity metric of the weight matrix \(W_{pq}\). To produce the appearance-map for face transfer, we design a new kernel structure with guided features:
where G and \(G'\) are the guided features to control the local property of the map diffusion, c and \(\alpha \) are the parameters to adjust the sensitivity of the guided features, \(\varepsilon \) is a small constant to avoid division by zero (typically \(\varepsilon =0.0001\)). The appearance-map \(T_{quot}\) and \(T_{mask}\) can be generated by different initialization \(R_{\{quot,mask\}}\) and weight matrix \(W_{\{quot,mask\}}\) with the corresponding guided features and parameters.
The appearance-map \(T_{quot}\) aims to adjust the illumination of the reference according to the target based on the facial shading model of the quotient image [15], as shown in Fig. 2(c). To produce \(T_{quot}\), we set \(R_{quot}=\frac{f_{aWLS}(I'_{tar})}{f_{aWLS}(I'_{ref})}\), where \(R_{quot}\) is the quotient image of the matched target \(I'_{tar}\) and reference \(I'_{ref}\) using Liang’s adaptive weighted least squares filter \(f_{aWLS}\) [10] for edge-aware smoothing, as shown in Fig. 2(b). For the weight matrix \(W_{quot}\), we set \(\alpha =1\) and \(G=logL'_{ref}\), where \(L'_{ref}\) is the luminance of \(I'_{ref}\) (Fig. 2(a)); the value of \(cG'\) is small within the facial region and large in the background so that makes the features of the quotient image diffuse across the significant edges within the facial region to the whole image.
The appearance-map \(T_{mask}\) is responsible to paste the facial region of the reference to the target with smooth transition between different regions. For \(T_{mask}\), we set \(R_{quot}\) as a binary mask according to the facial landmarks, as shown in Fig. 2(d). To produce \(T_{mask}\) with adaptive region boundary, we set \(\alpha =1.2\), \(G=logL'_{ref}\) and \(c=0.5\) with \(G'=J\), where \(L'_{ref}\) is the luminance of \(I'_{ref}\) and J is the all-one matrix. The map diffusion is controlled by the gradient of the guided feature G, which assures the smooth transition of the blended region between \(I'_{ref}\) and \(I_{tar}\), as shown in Fig. 2(e).
4 Experiments
4.1 Basic Evaluation
The evaluations for facial appearance-map are shown in Figs. 2 and 3. The results show that the generated \(T_{quot}\) efficiently propagates the quotient value from the constrained regions of \(R_{quot}\) to the other regions, like eyes, eyebrows and background, and preserves the illumination consistence in the blended face. The appearance-map \(T_{mask}\) is generated with smooth transition, which is adapted to the region boundary between the face regions of the faces. The illumination-aware and region-aware diffusion of \(T_{quot}\) and \(T_{mask}\) ensure the robustness of the appearance transfer for faces with different properties, as shown in the experiments of face transfer.
The basic experimental evaluations of face blending with the appearance-map were performed for the face pairs with significant different appearance properties, such as lighting, color, age and gender, as shown in Figs. 4, 5 and 6. The test images were taken from the FEI face database or the internet. The good visual consistency of the results indicate the effectiveness and robustness of our method.
4.2 Comparison with Related Methods
We also made comparison with the related methods for face replacement [1, 13]. Figure 4 shows the comparison with the Poisson cloning [13]. Due to the dependency of the gradient and boundary of the blended region, the results of [13] are sensitive to the lighting and color differences of the faces. In contrast, our method obtains natural face blending effects. Comparison with Bitouk’s method in Fig. 5 further validates the effectiveness of our diffusion-based model.
We made the comparison between Dale’s [4] and our method for face dubbing, which aims to transfer the series of the face appearance of the reference to the target. The results indicate that both the methods can achieve good visual consistency in a global manner, as shown in Fig. 6. The close-up images of the local region in the first rows of Fig. 6, however, show the subtle differences. Dale’s method [4] tends to transfer the lighting property of the reference to the target, while ours tends to preserve the original appearance property of the target, which is complementary to [4].
5 Conclusion
This paper proposes a label propagation model with adaptive regularization to achieve facial blending with good visual consistency. Specifically, the illumination-aware and region-aware facial appearance maps are generated by diffusion with different guided features. Experiments illustrate the effectiveness and robustness of our methods for face replacement and face dubbing.
References
Bitouk, D., Kumar, N., Dhillon, S., Belhumeur, P., Nayar, S.K.: Face swapping: automatically replacing faces in photographs. ACM Trans. Graph. 27(3), 39 (2008)
Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)
Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. ACM Trans. Graph. 30(6), 130 (2011)
Garrido, P., Valgaerts, L., Rehmsen, O., Thormahlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: Proceedings of CVPR, pp. 4217–4224 (2014)
Kemelmacher-Shlizerman, I., Suwajanakorn, S., Seitz, S.: Illumination-aware age progression. In: Proceedings of CVPR, pp. 3334–3341 (2014)
Korshunova, I., Shi, W., Dambre, J., Theis, L.: Fast face-swap using convolutional neural networks. In: Proceedings of ICCV, pp. 3677–3685 (2017)
Krishnan, D., Fattal, R., Szeliski, R.: Efficient preconditioning of laplacian matrices for computer graphics. ACM Trans. Graph. 32(4), 142 (2013)
Lee, S., Wolberg, G., Shin, S.Y.: Scattered data interpolation with multilevel B-splines. IEEE Trans. Vis. Comput. Graph. 3(3), 228–244 (1997)
Liang, L., Jin, L.: A new face relighting method based on edge-preserving filter. IEICE Trans. Inf. Syst. E96–D(12), 2904–2907 (2013)
Liang, L., Jin, L., Liu, D.: Edge-aware label propagation for mobile facial enhancement on the cloud. IEEE Trans. Circuits Syst. Video Technol. 27(1), 125–138 (2017)
Lukac, R.: Computational Photography: Methods and Applications. CRC Press, Boca Raton (2010)
Pérez, P., Gangnet, M., Blake, A.: Poisson image editing. ACM Trans. Graph. 22, 313–318 (2003)
Reinhard, E., Efros, A.A., Kautz, J., Seidel, H.P.: On visual realism of synthesized imagery. Proceedings of IEEE 101(9), 1998–2007 (2013)
Shashua, A., Riklin-Raviv, T.: The quotient image: class-based re-rendering and recognition with varying illuminations. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 129–139 (2001)
Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34(6), 183 (2015)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Zhang, Y., Zheng, L., Thing, V.L.: Automated face swapping and its detection. In: IEEE International Conference on Signal and Image Processing, pp. 15–19 (2017)
Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Two-stream neural networks for tampered face detection. In: Proceedings of CVPR Workshops, pp. 1831–1839 (2017)
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Liang, L., Zhang, X. (2018). Content-Aware Face Blending by Label Propagation. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11258. Springer, Cham. https://doi.org/10.1007/978-3-030-03338-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-03338-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03337-8
Online ISBN: 978-3-030-03338-5
eBook Packages: Computer ScienceComputer Science (R0)