Skip to main content

Advertisement

Log in

Mixed feelings: expression of non-basic emotions in a muscle-based talking head

  • Published:
Virtual Reality Aims and scope Submit manuscript

Abstract

We present an algorithm for generating facial expressions for a continuum of pure and mixed emotions of varying intensity. Based on the observation that in natural interaction among humans, shades of emotion are much more frequently encountered than expressions of basic emotions, a method to generate more than Ekman’s six basic emotions (joy, anger, fear, sadness, disgust and surprise) is required. To this end, we have adapted the algorithm proposed by Tsapatsoulis et al. [1] to be applicable to a physics-based facial animation system and a single, integrated emotion model. A physics-based facial animation system was combined with an equally flexible and expressive text-to-speech synthesis system, based upon the same emotion model, to form a talking head capable of expressing non-basic emotions of varying intensities. With a variety of life-like intermediate facial expressions captured as snapshots from the system we demonstrate the appropriateness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A publicly accessible web interface can be found at http://mary.dfki.de.

  2. A phrase is a part of a sentence delimited by grammatical pauses.

References

  1. Tsapatsoulis N, Raousaiou A, Kollias S, Cowie R, Douglas-Cowie E (2002) Emotion recognition and synthesis based on MPEG-4 FAPs MPEG-4 facial animation—the standard implementations applications. Wiley, Hillsdale, pp 141–167

  2. André E, Dybkyaer L, Minker W, Heisterkamp P (eds) (2004) In: Proceedings of the tutorial and research workshop on affective dialogue systems (ADS04), vol 3068 of lecture notes in artificial intelligence, Kloster Irsee, Germany. Springer, Berlin Heidelberg New York

  3. Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun Spec Issue Speech Emotion 40(1–2):5–32

    Google Scholar 

  4. Scherer K (2000) Psychological models of emotion. Neuropsychology of emotion. Oxford University Press, Oxford, pp 137–162

  5. The HUMAINE network portal. http://emotion-research.net

  6. Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen S (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: Proceedings of Eurospeech’01, vol 1, pp 87-90

  7. Schröder M (2004) Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proceedings of the workshop on affective dialogue systems, Kloster Irsee, Germany, pp 209–220

  8. Dutoit Th (1997) An Introduction to text-to-speech synthesis. Kluwer, Dordrecht

    Google Scholar 

  9. Klabbers E, Støber K, Veldhuis R, Wagner P, Stefan Breuer S (2001) Speech synthesis development made easy: the Bonn Open Synthesis System. In: Proceedings of Eurospeech 2001, Aalborg, pp 521–524

    Google Scholar 

  10. Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research development and teaching. Int J Speech Technology 6:365–377. http://mary.dfki.de

    Article  Google Scholar 

  11. Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636

    Article  PubMed  Google Scholar 

  12. Yang L (2001) Prosody as expression of emotion. In: Cavé Ch (ed) Proceedings of ORAGE 2001, Oralité et gestualité, pp 209–212

  13. Schröder M (2001) Emotional speech synthesis: a review. In: Proceedings of Eurospeech 2001, Aalborg, pp 561–564. http://www.dfki.de/~schroed

    Google Scholar 

  14. Allen J, Hunnicutt S, Klatt DH (1987) From text to speech: the MITalk system. Cambridge University Press, Cambridge

    Google Scholar 

  15. Cahn J (1990) The generation of affect in synthesized speech. J Am Voice I/O Soc 8:1–19

    Google Scholar 

  16. Black AW, Campbell N (1995) Optimising selection of units from speech databases for concatenative synthesis. In: Proceedings of Eurospeech 1995, Madrid, pp 581–584

  17. Johnson W, Narayanan S, Whitney R, Das R, Bulut M, LaBore C (2002) Limited domain synthesis of expressive military speech for animated characters. In: Proceedings of the 7th international conference on spoken language processing, Denver

  18. Dutoit Th, Pagel V, Pierret N, Bataille F, van der Vrecken O (1996) The MBROLA project: towards a set of high quality speech synthesisers free of use for non commercial purposes. In: Proceedings of the 4th international conference of spoken language processing, Philadelphia, pp 1393–1396

    Google Scholar 

  19. Schröder M, Grice M (2003) Expressing vocal effort in concatenative synthesis. In: Proceedings of the 15th international conference of phonetic sciences, Barcelona

  20. Lee Y, Terzopoulos D, Waters K (1995) Realistic face modeling for animation. In: Proceedings of SIGGRAPH’95, pp 55–62

  21. Kähler K, Haber J, Seidel HP (2001) Geometry-based muscle modeling for facial animation. In: Proceedings of Graphics Interface, pp 37–46

  22. Bregler Ch, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of SIGGRAPH ’97. ACM Press, Palo Alto, pp 353–360

    Google Scholar 

  23. Brand M (1999) Voice puppetry. In: Proceedings of SIGGRAPH ’99, pp 21–28

  24. Ezzat T, Geiger G, Poggio T (2002) Trainable videorealistic speech animation. In: Proceedings of SIGGRAPH’02, pp 388–398

  25. Parke F (1974) A parametric model for human faces. University of Utah, Salt Lake City

    Google Scholar 

  26. Cohen M, Massaro D (1993) Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann N, Thalmann D (eds) Models and techniques in computer animation, pp 139–156

  27. Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Computer animation’91

  28. Kalberer G, Müller P, Van Gool L (2003) A visual speech generator. In: Proceedings of Videometrics VII. IS&SPIE, pp 173–183

  29. Lee S, Badler J, Badler N (2002) Eyes alive. In: Proceedings of SIGGRAPH’02, pp 637–644

  30. Pearce A, Wyvill B, Wyvill G, Hill D (1986) Speech and expression: a computer solution to face animation. In: Proceedings of Graphics Interface ’86, pp 136–140

    Google Scholar 

  31. Ip H, Chan C (1996) Script-based facial gesture and speech animation using a NURBS based face model. Comput Graphics 20(6):881–891

    Article  Google Scholar 

  32. Kalra P, Mangili A, Magnenat-Thalmann N, Thalmann D (1991) SMILE: a multilayered facial animation system. In: Proceedings of IFIP WG 5.10, Tokyo, pp 189–198

  33. Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Proceedings of SIGGRAPH ’94, pp 413–420

    Google Scholar 

  34. Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cogn Sci 20(1):1–46

    Article  Google Scholar 

  35. Albrecht I, Haber J, Seidel H-P (2002) Automatic generation of non-verbal facial expressions from speech. In: Proceedings of CGI, pp 283–293

  36. Albrecht I, Haber J, Kähler K, Schröder M, Seidel H-P (2002) May I talk to you? :-)—facial animation from text. In: Proceedings of Pacific Graphics, pp 77–86

  37. Ekman P, Keltner D (1997) Universal facial expressions of emotion: an old controversy and new findings. In: segerstrøle U, Molnár P (eds) Nonverbal communication: where nature meets culture. Lawrence Erlbaum Associates Inc., Mahwah, pp 27–46

  38. Byun M, Badler N (2002) FacEMOTE: qualitative parametric modifiers for facial animations. In: Proceedings of SCA’02, pp 65–71

  39. Ruttkay Z, Noot H, ten Hagen P (2003) Emotion disc and emotion squares: tools to explore the facial expression space. Comput Graphics Forum 22(1):49–53

    Article  Google Scholar 

  40. Whissell C (1989) The dictionary of affect in language emotion: theory research and experience. In: Plutchik R, Kellerman H (eds) The measurement of emotions, chap 5, vol 4. Academic, San Diego, pp 113–131

  41. Plutchik R (1980) Emotions: a psychoevolutionary synthesis. Harper & Row, New York

    Google Scholar 

  42. Bui T, Heylen D, Nijholt A (2004) Combination of facial movements on a 3D talking head. In: Proceedings of CGI’04, pp 284–291

  43. Schröder M (2004) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis, vol 7 of Phonus, Research Report of the Institute of Phonetics, Saarland University http://www.dfki.de/~schroed

    Google Scholar 

  44. Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA workshop on speech and emotion, Northern Ireland, pp 19–24. http://www.qub.ac.uk/en/isca/proceedings, pp 19–24

  45. Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun Spec Issue Speech Emotion 40(1–2):33–60

    Google Scholar 

  46. Cowie R, Douglas-Cowie E, Appolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed) Computational intelligence and applications. World Scientific & Engineering Society Press, pp 109–114

  47. Krenn B, Pirker H, Grice M, Piwek P, van Deemter K, Schröder M, Klesen M, Gstrein E (2002) Generation of multimodal dialogue for net environments. In: Proceedings of Konvens, Saarbrücken.http://www.ai.univie.ac.at/NECA, Saarbrücken

  48. Schröder M, Breuer S (2004) XML representation languages as a way of interconnecting TTS modules. In: Proceedings of ICSLP’04, Jeju

  49. Kähler K, Haber J, Seidel H-P (2002) Head shop: generating animated head models with anatomical structure. In: Proceedings of SCA’02, pp 55–64

    Google Scholar 

  50. Ekman P, Wallace W (1969) The repertoire of nonverbal behavior: categories origins usage and coding. Semiotica 1:49–98

    Google Scholar 

Download references

Acknowledgements

Part of this research is supported by the EC Project HUMAINE (IST-507422).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene Albrecht.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albrecht, I., Schröder, M., Haber, J. et al. Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality 8, 201–212 (2005). https://doi.org/10.1007/s10055-005-0153-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-005-0153-5

Keywords

Navigation