skip to main content
10.1145/3678957.3685712acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open access

Generating Facial Expression Sequences of Complex Emotions with Generative Adversarial Networks

Published: 04 November 2024 Publication History

Abstract

There is a rising interest in animating realistic virtual agents for multiple purposes in different domains. Such a task requires systems capable of generating complex mental states on par with human emotional complexity. Considering the high representational capacity of Generative Adversarial Networks (GANs), it is only natural to consider them in such applications. In this work, we propose a conditional GAN model for generating sequences of facial expressions of categorical complex emotions. Trained on a scarce and highly imbalanced dataset, the proposed model is able to generate realistic variable-length sequences in a single inference step. These expressions of emotional states, of which there are 24 in total, follow the Facial Actions Coding System (FACS) formatting. In the absence of meaningful objective evaluation methods, we propose a deep-learning-based metric to assess the realism of generated Action Unit (AU) sequences: the Action Unit Fréchet Inception Distance (AUFID). Objective and subjective results validate the realism of our generated samples.

Supplemental Material

WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
WEBM File
Videos of sequences (real and synthetic) visualized on OpenFACS.
PDF File
Appendix: PCA results for all 24 categorical emotional states.

References

[1]
Petar S. Aleksic, Gerasimos Potamianos, and Aggelos K. Katsaggelos. 2005. 10.8 - Exploiting Visual Information in Automatic Speech Processing. In Handbook of Image and Video Processing (Second Edition) (second edition ed.), AL BOVIK (Ed.). Academic Press, Burlington, 1263–XXXIX. https://doi.org/10.1016/B978-012119792-6/50134-0
[2]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 214–223. https://proceedings.mlr.press/v70/arjovsky17a.html
[3]
Tadas Baltrušaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. IEEE International Conference on Automatic Face and Gesture Recognition (2018).
[4]
Simon Baron-Cohen and Jessica Kingsley. 2007. Mind Reading: The Interactive Guide to Emotions. J Can Acad Child Adolesc Psychiatry (2007).
[5]
Christoph Bartneck. 2023. Godspeed Questionnaire Series: Translations and Usage. Springer, Cham, 1–35. https://doi.org/10.1007/978-3-030-89738-3_24-1
[6]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In The Seventh International Conference on Learning Representations. ICLR. arxiv:1809.11096 [cs.LG]
[7]
Sen Chen, Zhilei Liu, Jiaxing Liu, Zhengxiang Yan, and Longbiao Wang. 2021. Talking Head Generation with Audio and Speech Related Facial Action Units. ArXiv abs/2110.09951 (2021). https://api.semanticscholar.org/CorpusID:239024632
[8]
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). CVPR. arxiv:1711.09020 [cs.CV] https://arxiv.org/abs/1711.09020
[9]
Matthieu Courgeon, Céline Clavel, and Jean-Claude Martin. 2014. Modeling Facial Signs of Appraisal During Interaction; Impact on Users’ Perception and Behavior. In international conference on Autonomous agents and multi-agent systems(Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems). Paris, France, 765–772. https://hal.science/hal-01144001
[10]
Vittorio Cuculo and Alessandro D’Amelio. 2019. OpenFACS: An Open Source FACS-Based 3D Face Animation System. In Image and Graphics, Yao Zhao, Nick Barnes, Baoquan Chen, Rüdiger Westermann, Xiangwei Kong, and Chunyu Lin (Eds.). Springer International Publishing, Cham, 232–242.
[11]
Alice Delbosc, Magalie Ochs, and Stephane Ayache. 2022. Automatic facial expressions, gaze direction and head movements generation of a virtual agent. In Companion Publication of the 2022 International Conference on Multimodal Interaction (Bengaluru, India) (ICMI ’22 Companion). Association for Computing Machinery, New York, NY, USA, 79–88. https://doi.org/10.1145/3536220.3558806
[12]
Alice Delbosc, Magalie Ochs, Nicolas Sabouret, Brian Ravenet, and Stephane Ayache. 2023. Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent. In Companion Publication of the 25th International Conference on Multimodal Interaction (, Paris, France, ) (ICMI ’23 Companion). Association for Computing Machinery, New York, NY, USA, 228–237. https://doi.org/10.1145/3610661.3616547
[13]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. CVPR, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[14]
P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. In Consulting Psychologists Press.
[15]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. In Proceedings of the twenty-eigth Annual Conference on Neural Information Processing Systems. NIPS. arxiv:1406.2661 [stat.ML]
[16]
Dan Hendrycks and Kevin Gimpel. 2023. Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415 (2023). arxiv:1606.08415 [cs.LG]
[17]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the the thirty-first Annual Conference on Neural Information Processing Systems (NIPS). NIPS, 6629 – 6640. arxiv:1706.08500 [cs.LG]
[18]
Paolo Domenico Lambiase, Alessandra Rossi, and Silvia Rossi. 2023. A Two-Tier GAN Architecture for Conditioned Expressions Synthesis on Categorical Emotions. International Journal of Social Robotics (03 2023), 1–17. https://doi.org/10.1007/s12369-023-00973-7
[19]
Jun Ling, Han Xue, Li Song, Shuhui Yang, Rong Xie, and Xiao Gu. 2020. Toward Fine-Grained Facial Expression Manipulation. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 37–53.
[20]
Zhilei Liu, Diyi Liu, and Yunpeng Wu. 2019. Region Based Adversarial Synthesis of Facial Action Units. Springer International Publishing, 514–526. https://doi.org/10.1007/978-3-030-37734-2_42
[21]
Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, and Juyong Zhang. 2019. Conditional adversarial synthesis of 3D facial action units. Neurocomputing 355 (2019), 200 – 208. https://doi.org/10.1016/j.neucom.2019.05.003 Cited by: 17; All Open Access, Green Open Access.
[22]
Birgit Lugrin, Catherine Pelachaud, Elisabeth André, Ruth Aylett, Timothy Bickmore, Cynthia Breazeal, Joost Broekens, Kerstin Dautenhahn, Jonathan Gratch, Stefan Kopp, Jacqueline Nadel, Ana Paiva, and Agnieszka Wykowska. 2022. Challenge Discussion on Socially Interactive Agents: Considerations on Social Interaction, Computational Architectures, Evaluation, and Ethics (1 ed.). Association for Computing Machinery, New York, NY, USA, 561–626. https://doi.org/10.1145/3563659.3563677
[23]
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least Squares Generative Adversarial Networks. In International Conference on Computer Vision (ICCV). arxiv:1611.04076 [cs.CV]
[24]
A. Miolla, M. Cardaioli, and C. Scarpazza. 2023. Padova Emotional Dataset of Facial Expressions (PEDFE): A unique dataset of genuine and posed emotional facial expressions. Behav Res Methods (2023).
[25]
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. arxiv:1411.1784 [cs.LG]
[26]
Radoslaw Niewiadomski, Sylwia Hyniewska, and Catherine Pelachaud. 2009. Evaluation of multimodal sequential expressions of emotions in ECA. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. 1 – 7. https://doi.org/10.1109/ACII.2009.5349569
[27]
Koichiro Niinuma, Itir Onal Ertugrul, Jeffrey F. Cohn, and László A. Jeni. 2022. Facial Expression Manipulation for Personalized Facial Action Estimation. Frontiers in Signal Processing 2 (2022). https://doi.org/10.3389/frsip.2022.861641
[28]
Naima Otberdout, Mohamed Daoudi, Anis Kacem, Lahoucine Ballihi, and Stefano Berretti. 2019. Dynamic Facial Expression Generation on Hilbert Hypersphere With Conditional Wasserstein Generative Adversarial Nets. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2019), 848–863. https://api.semanticscholar.org/CorpusID:198229775
[29]
Catherine Pelachaud, Carlos Busso, and Dirk Heylen. 2021. Multimodal Behavior Modeling for Socially Interactive Agents (1 ed.). Association for Computing Machinery, New York, NY, USA, 259–310. https://doi.org/10.1145/3477322.3477331
[30]
Albert Pumarola, Antonio Agudo, Aleix M. Martínez, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. GANimation: Anatomically-Aware Facial Animation from a Single Image. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X(Lecture Notes in Computer Science, Vol. 11214). Springer, 835–851. https://doi.org/10.1007/978-3-030-01249-6_50
[31]
Yong Zhao, Le Yang, Ercheng Pei, Meshia Cédric Oveneke, Mitchel Alioscha‐Perez, Longfei Li, Dongmei Jiang, and Hichem Sahli. 2021. Action Unit Driven Facial Expression Synthesis from a Single Image with Patch Attentive GAN. Computer Graphics Forum (2021). https://doi.org/10.1111/cgf.14202

Index Terms

  1. Generating Facial Expression Sequences of Complex Emotions with Generative Adversarial Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMI '24: Proceedings of the 26th International Conference on Multimodal Interaction
    November 2024
    725 pages
    ISBN:9798400704628
    DOI:10.1145/3678957
    This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 November 2024

    Check for updates

    Author Tags

    1. Complex Emotional States
    2. FACS
    3. Generative Adversarial Networks
    4. Synthetic Action Units

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMI '24
    ICMI '24: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    November 4 - 8, 2024
    San Jose, Costa Rica

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 123
      Total Downloads
    • Downloads (Last 12 months)123
    • Downloads (Last 6 weeks)63
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media