Abstract
We present an algorithm for generating facial expressions for a continuum of pure and mixed emotions of varying intensity. Based on the observation that in natural interaction among humans, shades of emotion are much more frequently encountered than expressions of basic emotions, a method to generate more than Ekman’s six basic emotions (joy, anger, fear, sadness, disgust and surprise) is required. To this end, we have adapted the algorithm proposed by Tsapatsoulis et al. [1] to be applicable to a physics-based facial animation system and a single, integrated emotion model. A physics-based facial animation system was combined with an equally flexible and expressive text-to-speech synthesis system, based upon the same emotion model, to form a talking head capable of expressing non-basic emotions of varying intensities. With a variety of life-like intermediate facial expressions captured as snapshots from the system we demonstrate the appropriateness of our approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
A publicly accessible web interface can be found at http://mary.dfki.de.
A phrase is a part of a sentence delimited by grammatical pauses.
References
Tsapatsoulis N, Raousaiou A, Kollias S, Cowie R, Douglas-Cowie E (2002) Emotion recognition and synthesis based on MPEG-4 FAPs MPEG-4 facial animation—the standard implementations applications. Wiley, Hillsdale, pp 141–167
André E, Dybkyaer L, Minker W, Heisterkamp P (eds) (2004) In: Proceedings of the tutorial and research workshop on affective dialogue systems (ADS04), vol 3068 of lecture notes in artificial intelligence, Kloster Irsee, Germany. Springer, Berlin Heidelberg New York
Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun Spec Issue Speech Emotion 40(1–2):5–32
Scherer K (2000) Psychological models of emotion. Neuropsychology of emotion. Oxford University Press, Oxford, pp 137–162
The HUMAINE network portal. http://emotion-research.net
Schröder M, Cowie R, Douglas-Cowie E, Westerdijk M, Gielen S (2001) Acoustic correlates of emotion dimensions in view of speech synthesis. In: Proceedings of Eurospeech’01, vol 1, pp 87-90
Schröder M (2004) Dimensional emotion representation as a basis for speech synthesis with non-extreme emotions. In: Proceedings of the workshop on affective dialogue systems, Kloster Irsee, Germany, pp 209–220
Dutoit Th (1997) An Introduction to text-to-speech synthesis. Kluwer, Dordrecht
Klabbers E, Støber K, Veldhuis R, Wagner P, Stefan Breuer S (2001) Speech synthesis development made easy: the Bonn Open Synthesis System. In: Proceedings of Eurospeech 2001, Aalborg, pp 521–524
Schröder M, Trouvain J (2003) The German text-to-speech synthesis system MARY: a tool for research development and teaching. Int J Speech Technology 6:365–377. http://mary.dfki.de
Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636
Yang L (2001) Prosody as expression of emotion. In: Cavé Ch (ed) Proceedings of ORAGE 2001, Oralité et gestualité, pp 209–212
Schröder M (2001) Emotional speech synthesis: a review. In: Proceedings of Eurospeech 2001, Aalborg, pp 561–564. http://www.dfki.de/~schroed
Allen J, Hunnicutt S, Klatt DH (1987) From text to speech: the MITalk system. Cambridge University Press, Cambridge
Cahn J (1990) The generation of affect in synthesized speech. J Am Voice I/O Soc 8:1–19
Black AW, Campbell N (1995) Optimising selection of units from speech databases for concatenative synthesis. In: Proceedings of Eurospeech 1995, Madrid, pp 581–584
Johnson W, Narayanan S, Whitney R, Das R, Bulut M, LaBore C (2002) Limited domain synthesis of expressive military speech for animated characters. In: Proceedings of the 7th international conference on spoken language processing, Denver
Dutoit Th, Pagel V, Pierret N, Bataille F, van der Vrecken O (1996) The MBROLA project: towards a set of high quality speech synthesisers free of use for non commercial purposes. In: Proceedings of the 4th international conference of spoken language processing, Philadelphia, pp 1393–1396
Schröder M, Grice M (2003) Expressing vocal effort in concatenative synthesis. In: Proceedings of the 15th international conference of phonetic sciences, Barcelona
Lee Y, Terzopoulos D, Waters K (1995) Realistic face modeling for animation. In: Proceedings of SIGGRAPH’95, pp 55–62
Kähler K, Haber J, Seidel HP (2001) Geometry-based muscle modeling for facial animation. In: Proceedings of Graphics Interface, pp 37–46
Bregler Ch, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of SIGGRAPH ’97. ACM Press, Palo Alto, pp 353–360
Brand M (1999) Voice puppetry. In: Proceedings of SIGGRAPH ’99, pp 21–28
Ezzat T, Geiger G, Poggio T (2002) Trainable videorealistic speech animation. In: Proceedings of SIGGRAPH’02, pp 388–398
Parke F (1974) A parametric model for human faces. University of Utah, Salt Lake City
Cohen M, Massaro D (1993) Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann N, Thalmann D (eds) Models and techniques in computer animation, pp 139–156
Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Computer animation’91
Kalberer G, Müller P, Van Gool L (2003) A visual speech generator. In: Proceedings of Videometrics VII. IS&SPIE, pp 173–183
Lee S, Badler J, Badler N (2002) Eyes alive. In: Proceedings of SIGGRAPH’02, pp 637–644
Pearce A, Wyvill B, Wyvill G, Hill D (1986) Speech and expression: a computer solution to face animation. In: Proceedings of Graphics Interface ’86, pp 136–140
Ip H, Chan C (1996) Script-based facial gesture and speech animation using a NURBS based face model. Comput Graphics 20(6):881–891
Kalra P, Mangili A, Magnenat-Thalmann N, Thalmann D (1991) SMILE: a multilayered facial animation system. In: Proceedings of IFIP WG 5.10, Tokyo, pp 189–198
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Proceedings of SIGGRAPH ’94, pp 413–420
Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cogn Sci 20(1):1–46
Albrecht I, Haber J, Seidel H-P (2002) Automatic generation of non-verbal facial expressions from speech. In: Proceedings of CGI, pp 283–293
Albrecht I, Haber J, Kähler K, Schröder M, Seidel H-P (2002) May I talk to you? :-)—facial animation from text. In: Proceedings of Pacific Graphics, pp 77–86
Ekman P, Keltner D (1997) Universal facial expressions of emotion: an old controversy and new findings. In: segerstrøle U, Molnár P (eds) Nonverbal communication: where nature meets culture. Lawrence Erlbaum Associates Inc., Mahwah, pp 27–46
Byun M, Badler N (2002) FacEMOTE: qualitative parametric modifiers for facial animations. In: Proceedings of SCA’02, pp 65–71
Ruttkay Z, Noot H, ten Hagen P (2003) Emotion disc and emotion squares: tools to explore the facial expression space. Comput Graphics Forum 22(1):49–53
Whissell C (1989) The dictionary of affect in language emotion: theory research and experience. In: Plutchik R, Kellerman H (eds) The measurement of emotions, chap 5, vol 4. Academic, San Diego, pp 113–131
Plutchik R (1980) Emotions: a psychoevolutionary synthesis. Harper & Row, New York
Bui T, Heylen D, Nijholt A (2004) Combination of facial movements on a 3D talking head. In: Proceedings of CGI’04, pp 284–291
Schröder M (2004) Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. PhD Thesis, vol 7 of Phonus, Research Report of the Institute of Phonetics, Saarland University http://www.dfki.de/~schroed
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA workshop on speech and emotion, Northern Ireland, pp 19–24. http://www.qub.ac.uk/en/isca/proceedings, pp 19–24
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun Spec Issue Speech Emotion 40(1–2):33–60
Cowie R, Douglas-Cowie E, Appolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed) Computational intelligence and applications. World Scientific & Engineering Society Press, pp 109–114
Krenn B, Pirker H, Grice M, Piwek P, van Deemter K, Schröder M, Klesen M, Gstrein E (2002) Generation of multimodal dialogue for net environments. In: Proceedings of Konvens, Saarbrücken.http://www.ai.univie.ac.at/NECA, Saarbrücken
Schröder M, Breuer S (2004) XML representation languages as a way of interconnecting TTS modules. In: Proceedings of ICSLP’04, Jeju
Kähler K, Haber J, Seidel H-P (2002) Head shop: generating animated head models with anatomical structure. In: Proceedings of SCA’02, pp 55–64
Ekman P, Wallace W (1969) The repertoire of nonverbal behavior: categories origins usage and coding. Semiotica 1:49–98
Acknowledgements
Part of this research is supported by the EC Project HUMAINE (IST-507422).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Albrecht, I., Schröder, M., Haber, J. et al. Mixed feelings: expression of non-basic emotions in a muscle-based talking head. Virtual Reality 8, 201–212 (2005). https://doi.org/10.1007/s10055-005-0153-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-005-0153-5