Abstract
Facial expression synthesis has several applications involving animation, human-computer interaction, entertainment, and training people with mental disorders. Facial expression synthesis aims to alter an image’s facial expression, usually by reenacting the facial movements from an example image in the target image. Deformation-based approaches usually choose the example image manually, leading to different results depending on this choice. This study differs from the literature by proposing and evaluating techniques that consider the similarity between facial images to choose the source image. The primary goal is to investigate the influence of selecting the source image in generated facial expressions of emotions. We propose three techniques for selecting similar faces in the facial expression synthesis pipeline and compare them to other approaches. We also compare the generated synthetic emotions with the results of recent methods from the literature by using objective metrics. Our findings suggest that one of the proposed techniques presented higher results in the search for similar faces and similar or better results for the synthesis when compared to the literature. Additionally, a visual analysis showed that similar faces can improve synthetic images’ realism, especially when compared to randomly selected facial images.
Similar content being viewed by others
References
Abboud B, Davoine F, Dang M (2004) Facial expression recognition and synthesis based on an appearance model. Signal Process Image Commun 19(8):723–740. https://doi.org/10.1016/j.image.2004.05.009
Agarwal S, Chatterjee M, mukherjee DP (2012) Synthesis of emotional expressions specific to facial structure. In: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 12(28):1–28:8. ACM, New York, NY, USA. https://doi.org/10.1145/2425333.2425361
Aifanti N, Papachristou C, Delopoulos A (2010) The mug facial expression database. In: 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10:1–4
Averbuch-Elor H, Cohen-Or D, Kopf J, Cohen MF (2017) Bringing portraits to life. ACM Trans Graph 36(6):196:1–196:13. https://doi.org/10.1145/3130800.3130818
Bailey DG (2011) Design for embedded image processing on FPGAs. John Wiley & Sons
Bradski G (2000) The opencv library. Dr. Dobb’s J Softw Tools
Cheng Y, Ling S (2008) 3d animated facial expression and autism in Taiwan. In: Advanced Learning Technologies, 2008. ICALT’08. Eighth IEEE International Conference on, pp. 17–19. IEEE
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8789–8797. https://doi.org/10.1109/CVPR.2018.00916
Deb D, Zhang J, Jain AK (2020) Advfaces: Adversarial face synthesis. In: 2020 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10. https://doi.org/10.1109/IJCB48548.2020.9304898
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255
Ding H, Sricharan K, Chellappa R (2018) Exprgan: Facial expression editing with controllable expression intensity. In: Thirty-Second AAAI Conference on Artificial Intelligence
Ekman P, Friesen WV, Ellsworth P (1972) Emotion in the human face: Guidelines for research and an integration of findings. Pergamon Press, Oxford, England
Ekman P, Friesen WV, Hager JC (2002) Facs investigator’s guide. A Hum Face
Everingham M, Eslami SM, Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5
Fujishiro H, Suzuki T, Nakano S, Mejima A, Morishima S (2009) A natural smile synthesis from an artificial smile. In: SIGGRAPH ’09: Posters, SIGGRAPH ’09, pp. 59:1–59:1. ACM, New York, NY, USA. https://doi.org/10.1145/1599301.1599360
Geng J, Shao T, Zheng Y, Weng Y, Zhou K (2018) Warp-guided gans for single-photo facial animation. ACM Trans Graph 37(6). https://doi.org/10.1145/3272127.3275043
Ghent J, McDonald J (2005) Photo-realistic facial expression synthesis. Image Vis Comput 23(12), 1041–1050. https://doi.org/10.1016/j.imavis.2005.06.011
Golan O, Baron-Cohen S (2006) Systemizing empathy: Teaching adults with asperger syndrome or high-functioning autism to recognize complex emotions using interactive multimedia. Dev Psychopathol 591–617. https://doi.org/10.1017/S0954579406060305
Grynszpan O, Martin JC, Nadel J (2008) Multimedia interfaces for users with high functioning autism: An empirical investigation. International J Hum Comput Stud 66(8):628–639
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Horn RA, Johnson CR (2012) Matrix Analysis, 2nd edn. Cambridge University Press, USA
Izard CE (1971) The face of emotion. Appleton-Century-Crofts, East Norwalk, CT, US
Jian M, Cui C, Nie X, Zhang H, Nie L, Yin Y (2019) Multi-view face hallucination using svd and a mapping model. Inf Scie 488:181–189. https://doi.org/10.1016/j.ins.2019.03.026. https://www.sciencedirect.com/science/article/pii/S0020025519302245
Jian M, Lam K (2015) Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition. IEEE Transactions on Circuits and Systems for Video Technology 25(11):1761–1772. https://doi.org/10.1109/TCSVT.2015.2400772
Jian M, Lam KM (2014) Face-image retrieval based on singular values and potential-field representation. Signal Process 100:9–15. https://doi.org/10.1016/j.sigpro.2014.01.004. https://www.sciencedirect.com/science/article/pii/S0165168414000073
Jian M, Lam KM, Dong J (2014) Facial-feature detection and localization based on a hierarchical scheme. Inf Sci 262:1–14. https://doi.org/10.1016/j.ins.2013.12.001. https://www.sciencedirect.com/science/article/pii/S0020025513008451
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 1867–1874. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/CVPR.2014.241
King DE (2016) Dlib face detection dataset. http://dlib.net/. Accessed 25 Mar 2020
King DE (2017) High quality face recognition with deep metric learning. http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html . [Online; Acessado em: 01 Oct 2018]
Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization. In: Proceedings First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies
Lahiri U, Bekele E, Dohrmann E, Warren Z, Sarkar N (2013) Design of a virtual reality based adaptive response technology for children with autism. IEEE Trans Neural Syst Rehab Eng 21(1):55–64
Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G (2016) Labeled Faces in the Wild: A Survey, pp. 189–248. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25958-1_8
Li K, Dai Q, Wang R, Liu Y, Xu F, Wang J (2014) A data-driven approach for facial expression retargeting in video. IEEE Trans Multimed 16(2):299–310. https://doi.org/10.1109/TMM.2013.2293064
Li K, Xu F, Wang J, Dai Q, Liu Y (2012) A data-driven approach for facial expression synthesis in video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 57–64. https://doi.org/10.1109/CVPR.2012.6247658
Li X, Chang CC, Chang SK (2007) Face alive icon. J Vis Lang Comput 18(4):440–453. https://doi.org/10.1016/j.jvlc.2007.02.008
Li Z, Zhu C, Gold C (2004) Digital terrain modeling: principles and methodology. CRC Press
Liu Z, Shan Y, Zhang Z (2001) Expressive expression mapping with ratio images. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, pp. 271–276. ACM, New York, NY, USA. https://doi.org/10.1145/383259.383289
Marčetić D, Soldić M, Ribarić S (2017) Hybrid cascade model for face detection in the wild based on normalized pixel difference and a deep convolutional neural network. In: M. Felsberg, A. Heyden, N. Krüger (eds.) Computer Analysis of Images and Patterns, pp. 379–390. Springer International Publishing, Cham
Masi I, Wu Y, Hassner T, Natarajan P (2018) Deep face recognition: A survey. In: 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 471–478. https://doi.org/10.1109/SIBGRAPI.2018.00067
Mena-Chalco J, Junior RC, Velho L (2008) Banco de dados de faces 3d: Impa-face3d. Tech Rep IMPA - RJ. http://app.visgraf.impa.br/database/faces/
Mendi E, Bayrak C (2011) Facial animation framework for web and mobile platforms. In: 2011 IEEE 13th International Conference on e-Health Networking, Appl Serv 52–55. https://doi.org/10.1109/HEALTH.2011.6026785
Mima D, Kubo H, Maejima A, Morishima S (2011) Automatic generation of facial wrinkles according to expression changes. In: SIGGRAPH Asia 2011 Posters, SA ’11, pp. 1:1–1:1. ACM, New York, NY, USA. https://doi.org/10.1145/2073304.2073306
Moghadam SM, Seyyedsalehi SA (2018) Nonlinear analysis and synthesis of video images using deep dynamic bottleneck neural networks for face recognition. Neural Netw. https://doi.org/10.1016/j.neunet.2018.05.016
Ng H, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 343–347. https://doi.org/10.1109/ICIP.2014.7025068
Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’01, pp. 277–288. ACM, New York, NY, USA. https://doi.org/10.1145/383259.383290
Otberdout N, Daoudi M, Kacem A, Ballihi L, Berretti S (2020) Dynamic facial expression generation on hilbert hypersphere with conditional wasserstein generative adversarial nets. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 1–1. https://doi.org/10.1109/TPAMI.2020.3002500
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. Br Mach Vis Conf
Pickering MJ, Rüger S (2003) Evaluation of key frame-based retrieval techniques for video. Comput Vis Image Underst 92(2–3):217–235. https://doi.org/10.1016/j.cviu.2003.06.002
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47:3–18. https://doi.org/10.1016/j.imavis.2016.01.002. http://www.sciencedirect.com/science/article/pii/S0262885616000147. 300-W, the First Automatic Facial Landmark Detection in-the-Wild Challenge
Seo M, Chen YW (2012) Two-step subspace learning for texture synthesis of facial images. In: 2012 6th International Conference on New Trends in Information Science and Service Science and Data Mining (ISSDM), pp. 483–486
Song L, Lu Z, He R, Sun Z, Tan T (2018) Geometry guided adversarial facial expression synthesis. In: Proceedings of the 26th ACM International Conference on Multimedia, MM ’18, pp. 627–635. ACM, New York, NY, USA. https://doi.org/10.1145/3240508.3240612
Testa RL, Corra CG, Machado-Lima A, Nunes FLS (2019) Synthesis of facial expressions in photographs: Characteristics, approaches, and challenges. ACM Comput Surv 51(6):124:1–124:35. https://doi.org/10.1145/3292652
Testa RL, Machado-Lima A, Nunes FLS (2018) Factors influencing the perception of realism in synthetic facial expressions. In: 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 297–304. https://doi.org/10.1109/SIBGRAPI.2018.00045
Thies J, Zollhöfer M, Nieundefinedner M (2019) Deferred neural rendering: Image synthesis using neural textures. ACM Trans Graph 38(4). https://doi.org/10.1145/3306346.3323035
Tulyakov S, Liu MY, Yang X, Kautz J (2018) Mocogan: Decomposing motion and content for video generation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1526–1535. https://doi.org/10.1109/CVPR.2018.00165
Udupa JK, LeBlanc VR, Zhuge Y, Imielinska C, Schmidt H, Currie LM, Hirsch BE, Woodburn J (2006) A framework for evaluating image segmentation algorithms. Comput Med Imaging Graph 30(2):75–87. https://doi.org/10.1016/j.compmedimag.2005.12.001
Vondrick C, Pirsiavash H, Torralba A (2016) Generating videos with scene dynamics. In: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (eds.) Advn Neural Inf Process Syst 29
Wang N, Gao X, Tao D, Yang H, Li X (2017) Facial feature point detection: A comprehensive survey. Neurocomputing. https://doi.org/10.1016/j.neucom.2017.05.013
Wang Y, Bilinski P, Bremond F, Dantcheva A (2020) Imaginator: Conditional spatio-temporal gan for video generation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1149–1158. https://doi.org/10.1109/WACV45572.2020.9093492
Wang Z, Bovik AC (2006) Modern image quality assessment. Synth Lect Image Video Multimed Process 2(1):1–156
Wei W, Tian C, Maybank SJ, Zhang Y (2016) Facial expression transfer method based on frequency analysis. Pattern Recognit 49:115–128. https://doi.org/10.1016/j.patcog.2015.08.004
Xie W, Shen L, Jiang J (2017) A novel transient wrinkle detection algorithm and its application for expression synthesis. IEEE Transactions on Multimedia 19(2), 279–292. https://doi.org/10.1109/TMM.2016.2614429
Xie W, Shen L, Yang M, Jiang J (2018) Facial expression synthesis with direction field preservation based mesh deformation and lighting fitting based wrinkle mapping. Multimed Tools Appl 77(6):7565–7593 . https://doi.org/10.1007/s11042-017-4661-6
Xiong L, Zheng N, Du S, Wu L (2009) Extended facial expression synthesis using statistical appearance model. In: 2009 4th IEEE Conference on Industrial Electronics and Applications, pp. 1582–1587. https://doi.org/10.1109/ICIEA.2009.5138461
Xiong L, Zheng N, Liu, J, Du S, Liu Y (2010) Eye synthesis using the eye curve model. Image Vis Comput 28(3):329–342. https://doi.org/10.1016/j.imavis.2009.06.001
Xiong Z, Wu D, Gu W, Zhang H, Li B, Wang W (2020) Deep discrete attention guided hashing for face image retrieval. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, ICMR ’20, p. 136-144. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3372278.3390683
Yang S, Luo P, Loy CC, Tang X (2016) Wider face: A face detection benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zhang, H, Patel VM, Riggan BS, Hu S (2017) Generative adversarial network-based synthesis of visible faces from polarimetrie thermal faces. In: 2017 IEEE International Joint Conference on Biometrics (IJCB), pp. 100–107. https://doi.org/10.1109/BTAS.2017.8272687
Zhou Y, Shi BE (2017) Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 370–376. https://doi.org/10.1109/ACII.2017.8273626
Acknowledgements
The authors would like to thank the VISGRAF Lab at the Instituto Nacional de Matemática Pura e Aplicada and the Multimedia Understanding Group at the Aristotle University of Thessaloniki for providing the images used in this study. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001, the Dean’s Office for Research of the University of São Paulo (process 18.5.245.86.7 – Intelligent psychiatric disorder classification system based on facial anthropometric measurements), Brazilian National Council of Scientific and Technological Development (CNPq) (Process 157535/2017-7) and São Paulo Research Foundation (FAPESP) (Process 14/50889-7): National Institute of Science and Technology – Medicine Assisted by Scientific Computing (INCT-MACC).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Testa, R.L., Machado-Lima, A. & Nunes, F.L.S. Facial expression synthesis based on similar faces. Multimed Tools Appl 80, 36465–36489 (2021). https://doi.org/10.1007/s11042-021-11525-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11525-4