Skip to main content
Log in

Induction, recording and recognition of natural emotions from facial expressions and speech prosody

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Recording and annotating a multimodal database of natural expressivity is a task that requires careful planning and implementation, before even starting to apply feature extraction and recognition algorithms. Requirements and characteristics of such databases are inherently different than those of acted behaviour, both in terms of unconstrained expressivity of the human participants, and in terms of the expressed emotions. In this paper, we describe a method to induce, record and annotate natural emotions, which was used to provide multimodal data for dynamic emotion recognition from facial expressions and speech prosody; results from a dynamic recognition algorithm, based on recurrent neural networks, indicate that multimodal processing surpasses both speech and visual analysis by a wide margin. The SAL database was used in the framework of the Humaine Network of Excellence as a common ground for research in everyday, natural emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abassi AR, Uno T, Dailey M, Afzulpurkar NV (2007) Towards knowledge-based affective interaction: situational interpretation of affect. In: Paiva A, Prada R, Picard R (eds) Affective computing and intelligent interaction. Springer LNCS, Lisbon, Berlin, pp 452–463

  2. Auberge V, Audibert N, Rilliard A (2004) E-Wiz: a trapper protocol for hunting the expressive speech corpora in lab. In Proceedings of 4th international conference on language resources and evaluation (LREC), pp 179–182

  3. Bachorowski JA (1999) Vocal expression and perception of emotion. Curr Dir Psychol Sci 8(2):53–57

    Article  Google Scholar 

  4. Batliner A, Fischer K, Huber R, Spilker J, Noeth E (2003) How to find trouble in communication. Speech Commun 40:117–143

    Google Scholar 

  5. Bechara A, Damasio A, Damasio H, Anderson S (1994) Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50:7–15

    Article  Google Scholar 

  6. Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors: a closer look. In: Proceedings of international conference on language resources and evaluation, workshop on emotion: corpora for research on emotion and affect, Marrakech, Morocco

  7. Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1–2):49–66 (Springer)

    Google Scholar 

  8. Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15):2553–2562 (Elsevier)

    Google Scholar 

  9. Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: Proceedings of international conference on multimodal interfaces, Banff, Alberta

  10. Caridakis G, Raouzaiou A, Karpouzis K, Kollias S (2006) Synthesizing gesture expressivity based on real sequences. In: Proceedings of workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC, 2006 Conference. Genoa

  11. Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. LNCS, vol 4868. Springer, Heidelberg

  12. Chen J, Wechsler H (2007) Human computer intelligent interaction using augmented cognition and emotional intelligence. LNCS Volume 4563/2007. Springer, Berlin, pp 205–214

  13. Cowie R (2005) What are people doing when they assign everyday emotion terms? Psychol Inq 16(1):11–18

    Article  Google Scholar 

  14. Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun 40:5–32 (Elsevier)

    Google Scholar 

  15. Cowie R, Douglas-Cowie E, Apolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed) Computational intelligence and applications. World Scientific Engineering Society, pp 109–114

  16. Cowie R, Douglas-Cowie E, Karpouzis K, Caridakis G, Wallace M, Kollias S (2008) Recognition of emotional States in natural human-computer interaction. In: Tzovaras D (ed) Multimodal user interfaces. Springer Berlin, pp 119–153

  17. Cowie R, Douglas-Cowie E, Mckeown G, Gibney C The challenges of dealing with distributed signs of emotion: theory and empirical evidence. In: Proceedings ACII 2009 (published IEEE), vol 1 pp 351–356

  18. Cowie R, Douglas-Cowie E, Savvidou S, Mcmahon E, Sawey M, Schroeder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of ISCA workshop on speech and emotion, Northern Ireland, pp 19–24

  19. Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 33–80

  20. Cowie R, Mckeown G (2010) Statistical analysis of data from initial labelled database and recommendations for an economical coding scheme SEMAINE deliverable D6b Downloaded from http://www.semaine-project.eu/. 10/11/2010

  21. Dahlbaeck N, Jonsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: Proceedings of 1st international conference on intelligent user interfaces, Orlando, Florida, pp 193–200

  22. Devillers L, Cowie R, Martin J.-C, Douglas-Cowie E, Abrilian S, Mcrorie M (2006) Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: Proceedings of 5th international conference on language resources and evaluation, Genoa

  23. Dhall, A, Goecke R, Lucey, S, Gedeon T (2011) Acted facial expressions in the wild database. Technical Report TR-CS-11-02, Australian National University

  24. Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40:33–60 (Elsevier)

    Google Scholar 

  25. Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J-C, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of 2nd international conference on affective computing and intelligent interaction, Lisbon

  26. Duric Z (2002) Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc IEEE 90(7):1272–1289

    Google Scholar 

  27. Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392

    Article  Google Scholar 

  28. Ekman P, Friesen W (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, San Francisco

  29. Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211

    Article  Google Scholar 

  30. Emotionally Rich Man-machine Intelligent System (Ermis) (2008) IST-2000-29319. http://www.image.ntua.gr/ermis. Last retrieved 1 Sept 2008

  31. Eyben F, Wollmer M, Schuller B (2009) openEAR–introducing the Munich open-source emotion and affect recognition toolkit. In: Proceedings of of ACII. Amsterdam, The Netherlands, pp 576–581

  32. Eyben F, Wollmer M, Graves A, Schuller B, Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J Multimodal User Interfaces 3:7–19 (Special Issue on Real-time Affect Analysis and Interpretation: Closing the Loop in Virtual Agents, Springer)

    Google Scholar 

  33. Fragopanagos N, Taylor J (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405 (Elsevier)

    Google Scholar 

  34. Frijda NH (1986) The emotions, studies in emotion and social interaction. Cambridge University Press, New York

  35. Humaine Database. http://www.emotion-research.net/download/pilot-db. Last retrieved 1 September 2008

  36. Humaine IST, Human-Machine Interaction Network on Emotion, 2004–2007. http://www.emotion-research.net. Last retrieved 1 Sept 2008

  37. Ioannou S, Caridakis G, Karpouzis K, Kollias S (2007) Robust feature detection for facial expression recognition. EURASIP J Image Video Process (2)

  38. Ioannou S, Raouzaiou A, Tzouvaras V, Mailis T, Karpouzis K, Kollias S (2005) Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Netw 18(4):423–435. (Special Issue on Emotion: Understanding & Recognition, Elsevier)

    Google Scholar 

  39. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, London, pp 1–8

  40. Lazarus RS (1991) Emotion and adaptation. Oxford University Press, New York

    Google Scholar 

  41. Lazarus RS, Folkman S (1987) Transactional theory and research on emotions and coping. Eur J Pers 1(3):141–169

    Google Scholar 

  42. Martin J-C, Devillers L, Zara A, Maffiolo V, Lechenadec G (2006) The EmoTABOU corpus. Humaine Summer School, Genova, Italy, September 22–28

  43. Mcgilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark. In: Proceedings of the ISCA workshop on speech and emotion

  44. Mertens P (2004) The prosogram: semi-automatic transcription of prosody based on a tonal perception model. In: Bel B, Marlien I (eds) Proceedings of of Speech Prosody, Japan

  45. Ortony A, Collins A, Clore GL (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge

  46. Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal Mach Intell 22(12):1424–1445

    Google Scholar 

  47. Pantic M, Sebe N, Cohn J, Huang T (2005) Affective multimodal human-computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676

  48. Picard R (1997) Affective computing. MIT Press, Cambridge

    Google Scholar 

  49. Raouzaiou A, Tsapatsoulis N, Karpouzis K, Kollias S (2002) Parameterized facial expression synthesis based on MPEG-4. EURASIP J Appl Signal Process 2002(10):1021–1038

    Article  MATH  Google Scholar 

  50. Russell JA, Feldman-Barrett L (1999) Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J Pers Soc Psychol 76:805–819

    Article  Google Scholar 

  51. Schaefer A, Zimmermann HG (2007) Recurrent neural networks are universal approximators. Int J Neural Syst 17(4):253– 263

    Google Scholar 

  52. Scherer KR (1987) Toward a dynamic theory of emotion: the component process model of affective states. Geneva Stud Emot Commun 1:1–98

    Google Scholar 

  53. Sebe N, Cohen I, Huang TS (2005) Handbook of pattern recognition and computer vision. World Scientific, River Edge

    Google Scholar 

  54. Semaine IST, The sensitive agent project. http://www.semaine-project.eu. Last retrieved 1 Sept 2008

  55. Valstar M, Gunes H, Pantic M (2007) How to distinguish posed from spontaneous smiles using geometric features. In:. Massaro D, Takeda K, Roy D, Potamianos A (eds) Proceedings of the 9th international conference on multimodal interfaces, ICMI 2007, Nagoya, Aichi, Japan, November 12–15, pp 38–45

  56. Valstar M, Pantic M, Ambadar Z, Cohn J (2006) Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the 8th international conference on multimodal interfaces, ACM, New York, pp 162–170

  57. van Reekum C, Johnstone T, Banse R, Etter A, Wehrle T, Scherer K (2004) Psychophysiological responses to appraisal dimensions in a computer game. Cognit Emot 18(663–688)

    Google Scholar 

  58. Velten E (1998) A laboratory task for induction of mood states. Behav Res Therapy 35:72–82

    Google Scholar 

  59. Wallace M, Ioannou S, Raouzaiou A, Karpouzis K, Kollias S (2006) Dealing with feature uncertainty in facial expression recognition using possibilistic fuzzy rule evaluation. Int J Intell Syst Technol Appl 1(3–4)

    Google Scholar 

  60. Wang N, Marsella S (2006) Evg: an emotion evoking game. In: Proceedings of 6th international conference on intelligent virtual agents. Springer LNCS, pp 282–291

  61. Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):35–36

  62. Whissel CM (1989) The dictionary of affect in language. In: Plutchnik R, Kellerman H (eds) Emotion: theory, research and experience: vol 4, the measurement of emotions. Academic Press, New York

  63. Young JW (1993) Head and face anthropometry of adult U.S. civilians. FAA Civil Aeromedical Institute, 1963–1993

  64. Zeng Z, Pantic M, Roisman G, Huang TS (2007) A survey of affect recognition methods: audio, visual and spontaneous expressions. In: Proceedings of the 9th international conference on multimodal interfaces. ACM, New York, pp 126–133

  65. Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Caridakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karpouzis, K., Caridakis, G., Cowie, R. et al. Induction, recording and recognition of natural emotions from facial expressions and speech prosody. J Multimodal User Interfaces 7, 195–206 (2013). https://doi.org/10.1007/s12193-013-0122-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-013-0122-3

Keywords