Abstract
This study proposes a new learning-based smile synthesis system, in which a given neutral facial image is automatically transferred as a smile result in a certain style. Although the example-based face synthesis framework has made great progress recently, the construction of robust transformation, the preservation of personal characteristics and the production of high-quality images, etc. remain unresolved problems. These questions are addressed in the proposed framework using a new expression attention-guided global parametric model and local non-parametric model. Our key innovations include (a) a flexible framework design that produces expression attention regions with only expression category labels as supervision, (b) a novel smile style analysis framework that explores different smile styles from training samples that are then used to guide more robust face modeling, and (c) a two-step expression transformation approach is proposed that integrates global parametric models for robust prediction of expression geometry and local non-parametric models for high-quality image generation. Experimental results show that in the case of a limited training data scenario, the facial images obtained using the proposed framework are more vivid than those generated using existing synthesis methods. In addition, the proposed method can be extended directly to the image-to-image transformation task to produce high-quality hallucinations of faces, which is very importance in digital entertainment.














Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author, Ching-Ting Tu, upon reasonable request.
Change history
23 December 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11042-022-14324-7
References
Bouaziz S, Pauly M (2014) Semi-Supervised Facial Animation Retargeting. EPFL Technical Report. #202143
Bozorgtabar B, Mahapatra D, Thiran J-P (2020) ExprADA: Adversarial Domain Adaptation for Facial Expression Analysis. Patt Recognit. 107111
Choi Y, Choi M-J, Kim M, Ha J-W, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. CVPR:8789–8797
Choi Y, Uh Y, Yoo J, Ha J-W (2020) StarGAN v2: diverse image synthesis for multiple domains. CVPR:8185–8194
Chowdhary CL, Patel PV, Kathrotia KJ, Attique M, Kumaresan P, Ijaz MF (2020) Analytical Study of Hybrid Techniques for Image Encryption and Decryption. Sensors. 5162
Deng Z, Neumann U, Lewis JP, Kim TY, Bulut M, Narayanan S (2006) Expressive facial animation synthesis by learning speech Coarticulation and expression spaces. IEEE Trans Vis Comput Graph 12:1523–1534
Etoundi CML, Nkapkop JDD, Tsafack N, Ngono JM, Ele P, Wozniak M, Shafi J, Ijaz MF (2022) A Novel Compound-Coupled Hyperchaotic Map for Image Encryption. Symmetry. 493
Fan G-F, Zhang L-Z, Yu M, Hong W-C, Dong S-Q (2022) Applications of random forest in multivariable response surface for short-term load forecasting. Int J Electrical Power Energy Syst
Freeman WT, Pasztor EC (1999) Learning low-level vision. ICCV:1182–1189
Ghahramani Z, Hinton GE(1997) The EM Algorithm for Mixtures of Factor Analyzers. Technical Report CRG-TR-96-1
Gong B, Wang Y, Liu J, Tang X (2009) Automatic facial expression recognition on a dingle 3D face by exploring shape deformation. ACM Multimedia:569–572
Huang D, Torre FDL (2010) Bilinear Kernel Reduced Rank Regression for Facial Expression Synthesis. ECCV. 364–377
Huang L, Su C (2006) Facial expression synthesis using manifold learning and belief propagation. SoftComput:1193–1200
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. Int Conf Aut Face Gesture Recogn:46–53
Khan N, Akram A, Mahmood A, Ashraf S, Murtaza K (2020) Masked linear regression for learning local receptive fields for facial expression synthesis. Int J Comput Vis 128:1433–1454
Li K, Dai Q, Wang R, Liu Y, Xu F, Wang J (2014) A data-driven approach for facial expression retargeting in video. IEEE Trans Multimedia 16:299–310
Liu W, Chen W, Yang Z, Shen L (2021) Translate the facial regions you like using self-adaptive region translation. AAAI 35:2180–2188
Lu Z, Hu T, Song L, Zhang Z, He R (2018) Conditional expression synthesis with face parsing transformation. ACM Multimedia:1083–1091
Mohammed U, Prince SJD, Kautz J (2009) Visio-lization: generating novel facial images SIGGRAPH
Noh JY, Neumann U(2006) Expression Cloning. ACM SIGGRAPH courses
Peng Y, Yin H (2019) ApprGAN: appearance-based GAN for facial expression synthesis. IET Image Process 13:2706–2715
Pumarola A, Agudo A, Martínez AM, Sanfeliu A, Moreno-Noguer F (2020) GANimation: one-shot anatomically consistent facial animation. Int J Comput Vis 128:698–713
Sahoo KK, Dutta I, Ijaz MF, Wozniak M, Singh PK (2021) TLEFuzzyNet: fuzzy rank-based Ensemble of Transfer Learning Models for emotion recognition from human speeches. IEEE Access 9:166518–166530
Song Y, Bao L, Yang Q, Yang M-H (2014) Real-time Exemplar-based Face Sketch Synthesis. Proc. ECCV. pp. 800–813
Tamang J, Nkapkop JDD, Ijaz MF, Prasad PK, Tsafack N, Saha A, Kengne J, Son Y (2021) Dynamical properties of ion-acoustic waves in space plasma and its application to image encryption. IEEE Access 9:18762–18782
Tang H, Liu H, Xu D, Torr PHS, Sebe N (2021) AttentionGAN: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans Neural Networks Learn Syst
Torralba A, Murphy KP, Freeman WT (2007) Sharing visual features for multiclass and multi-view object detection. IEEE Trans Patt Anal Mach Intell 29:854–896
Tran DL, Walecki RT, Rudovic O, Eleftheriadis S, Schuller BW, Pantic M (2017) DeepCoder: Semi-Parametric Variational Autoencoders for Automatic Facial Action Coding. ICCV. pp.3209–3218
Wang S, Gu XD, Qin H (2008) Automatic non-rigid registration of 3D dynamic data for facial expression. IEEE Conf Comput Vision Patt Recogn 2008:1–8
Xia J, Quynh DTP, He Y, Chen X, Hoi SCH (2012) Modeling and compressing 3-D facial expressions using geometry videos. IEEE Trans Circ Syst Video Technol 22:77–90
Xu W, Xie X, Lai J (2021) RelightGAN: instance-level generative adversarial network for face illumination transfer. IEEE Trans Image Process 30:3450–3460
Yun T, Guan L (2013) A deformable 3-D facial expression model for dynamic human emotional state recognition. IEEE Trans Circ Syst Video Technol:142–157
Zhang Q, Liu Z, Quo G, Terzopoulos D, Shum HY (2006) Geometry-driven photorealistic facial expression synthesis. IEEE Trans Vis Comput Graph 12(1):48–60
Zhang Y, Ji Q, Zhu Z, Yi B (2008) Dynamic facial expression analysis and synthesis with MPEG-4 facial animation parameters. IEEE Trans Circ Syst Video Technol 18:1383–1396
Zhang F, Zhang T, Mao Q, Xu C (2020) Geometry guided pose-invariant facial expression recognition. IEEE Trans Image Process:4445–4460
Zhang F, Zhang T, Mao Q, Xu C (2020) A unified deep model for joint facial expression recognition, face synthesis, and face alignment. IEEE Trans Image Process 29:6574–6589
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV:2242–2251
Acknowledgements
This work was supported by the Ministry of Science and Technology of Taiwan under Grant numbers MOST 109-2221-E-005 -056 -MY2. We thank anonymous reviewers for the insightful comments that improved this paper. We acknowledge all the authors for distributing the source code into the public domain and allowing us to use it as a basis for modifying it as comparison methods in this study.
Funding
This work was supported by the Ministry of Science and Technology of Taiwan under Grant numbers MOST 109-2221-E-005 -056 -MY2.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The affiliation of the 2nd and 3rd authors in the original publication of this article was incorrect.
Appendix
Appendix
The detailed algorithm for the extraction of the expression-variant patch is shown in Algorithm 1, where smiling images and neutral images in the training set are taken as positive and negative samples, respectively. The goal of this algorithm is to extract a set of discriminative features to classify these positive and negative samples, where the facial regions of these extracted features are defined as expression-variant patches (EVP) in this study.
Expression-variant patch extraction by the GentleBoost algorithm [27].
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tu, CT., Hsieh, SH., Chen, KL. et al. Personalized smile synthesis using attention-guided global parametric model and local non-parametric model. Multimed Tools Appl 82, 21585–21609 (2023). https://doi.org/10.1007/s11042-022-14260-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14260-6