skip to main content
10.1145/3476099.3484317acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Invertable Frowns: Video-to-Video Facial Emotion Translation

Published:20 October 2021Publication History

ABSTRACT

We present Wav2Lip-Emotion, a video-to-video translation architecture that modifies facial expressions of emotion in videos of speakers. Previous work modifies emotion in images, uses a single image to produce a video with animated emotion, or puppets facial expressions in videos with landmarks from a reference video. However, many use cases such as modifying an actor's performance in post-production, coaching individuals to be more animated speakers, or touching up emotion in a teleconference require a video-to-video translation approach. We explore a method to maintain speakers' identity and pose while translating their expressed emotion. Our approach extends an existing multi-modal lip synchronization architecture to modify the speaker's emotion using L1 reconstruction and pre-trained emotion objectives. We also propose a novel automated emotion evaluation approach and corroborate it with a user study. These find that we succeed in modifying emotion while maintaining lip synchronization. Visual quality is somewhat diminished, with a trade off between greater emotion modification and visual quality between model variants. Nevertheless, we demonstrate (1) that facial expressions of emotion can be modified with nothing other than L1 reconstruction and pre-trained emotion objectives and (2) that our automated emotion evaluation approach aligns with human judgements.

References

  1. AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal Language Analysis in the Wild: Carnegie Mellon University-MOSEI Dataset and Interpretable Dynamic Fusion Graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Association for Computational Linguistics, Melbourne, Australia, 2236--2246. https://doi.org/10.18653/v1/P18--1208Google ScholarGoogle Scholar
  2. Julianne Gold Brunson and P Scott Lawrence. 2002. Impact of sign language interpreter and therapist moods on deaf recipient mood. Professional Psychology: Research and Practice , Vol. 33, 6 (2002), 576.Google ScholarGoogle ScholarCross RefCross Ref
  3. Anpei Chen, Zhang Chen, Guli Zhang, Kenny Mitchell, and Jingyi Yu. 2019. Photo-Realistic Facial Details Synthesis From Single Image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) .Google ScholarGoogle ScholarCross RefCross Ref
  4. Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. 2017. Lip Reading Sentences in the Wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3444--3453. https://doi.org/10.1109/CVPR.2017.367Google ScholarGoogle Scholar
  5. Joon Son Chung and Andrew Zisserman. 2016. Out of time: automated lip sync in the wild. In Workshop on Multi-view Lip-reading, ACCV .Google ScholarGoogle Scholar
  6. codeniko. 2019. 81 Facial Landmarks Shape Predictor . https://github.com/codeniko/shape_predictor_81_face_landmarks .Google ScholarGoogle Scholar
  7. D. Deng, Z. Chen, and B. E. Shi. 2020. Multitask Emotion Recognition with Incomplete Labels. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG) . IEEE Computer Society, Los Alamitos, CA, USA, 592--599. https://doi.org/10.1109/FG47880.2020.00131Google ScholarGoogle Scholar
  8. Joseph R Dusseldorp, Diego L Guarin, Martinus M van Veen, Nate Jowett, and Tessa A Hadlock. 2019. In the eye of the beholder: changes in perceived emotion expression after smile reanimation. Plastic and reconstructive surgery , Vol. 144, 2 (2019), 457--471.Google ScholarGoogle Scholar
  9. Paul Ekman. 1993. Facial expression and emotion. American psychologist , Vol. 48, 4 (1993), 384.Google ScholarGoogle Scholar
  10. Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, and Boqing Gong. 2019. Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation. Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 33, 01 (Jul. 2019), 3510--3517. https://doi.org/10.1609/aaai.v33i01.33013510Google ScholarGoogle ScholarCross RefCross Ref
  11. Panagiotis Giannopoulos, Isidoros Perikos, and Ioannis Hatzilygeroudis. 2018. Deep Learning Approaches for Facial Emotion Recognition: A Case Study on FER-2013 .Springer International Publishing, Cham, 1--16. https://doi.org/10.1007/978--3--319--66790--4_1Google ScholarGoogle Scholar
  12. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2018. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arxiv: 1706.08500 [cs.LG] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261--2269. https://doi.org/10.1109/CVPR.2017.243Google ScholarGoogle Scholar
  14. Carroll E. Izard. 1990. Facial expressions and the regulation of emotions. Journal of Personality and Social Psychology , Vol. 58, 3 (1990), 487--498. https://doi.org/10.1037/0022--3514.58.3.487Google ScholarGoogle ScholarCross RefCross Ref
  15. Jerome Kagan, Nancy Snidman, and Doreen Arcus. 1993. On the temperamental categories of inhibited and uninhibited children. Social withdrawal, inhibition, and shyness in childhood (1993), 19--28.Google ScholarGoogle Scholar
  16. Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . 4396--4405. https://doi.org/10.1109/CVPR.2019.00453Google ScholarGoogle ScholarCross RefCross Ref
  17. Caroline F Keating, Allan Mazur, and Marshall H Segall. 1977. Facial gestures which influence the perception of status. Sociometry (1977), 374--378.Google ScholarGoogle Scholar
  18. Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research , Vol. 10 (2009), 1755--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D Kollias, A Schulc, E Hajiyev, and S Zafeiriou. [n.d.]. Analysing Affective Behavior in the First ABAW 2020 Competition. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG) . 794--800.Google ScholarGoogle Scholar
  20. Dimitrios Kollias and Stefanos Zafeiriou. 2018. Aff-Wild2: Extending the Aff-Wild Database for Affect Recognition. CoRR , Vol. abs/1811.07770 (2018). arxiv: 1811.07770 http://arxiv.org/abs/1811.07770Google ScholarGoogle Scholar
  21. Andrea Miller, Renita Coleman, and Donald Granberg. 2007. TV Anchors, Elections & Bias: A Longitudinal Study of the Facial Expressions of Brokaw Rather Jennings. Visual Communication Quarterly , Vol. 14, 4 (2007), 244--257. https://doi.org/10.1080/15551390701730232 https://doi.org/10.1145/3449063Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Invertable Frowns: Video-to-Video Facial Emotion Translation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              ADGD '21: Proceedings of the 1st Workshop on Synthetic Multimedia - Audiovisual Deepfake Generation and Detection
              October 2021
              39 pages
              ISBN:9781450386821
              DOI:10.1145/3476099
              • Program Chairs:
              • Stefan Winkler,
              • Weiling Chen,
              • Abhinav Dhall,
              • Pavel Korshunov

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 20 October 2021

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Upcoming Conference

              MM '24
              MM '24: The 32nd ACM International Conference on Multimedia
              October 28 - November 1, 2024
              Melbourne , VIC , Australia

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader