Abstract
Recording and annotating a multimodal database of natural expressivity is a task that requires careful planning and implementation, before even starting to apply feature extraction and recognition algorithms. Requirements and characteristics of such databases are inherently different than those of acted behaviour, both in terms of unconstrained expressivity of the human participants, and in terms of the expressed emotions. In this paper, we describe a method to induce, record and annotate natural emotions, which was used to provide multimodal data for dynamic emotion recognition from facial expressions and speech prosody; results from a dynamic recognition algorithm, based on recurrent neural networks, indicate that multimodal processing surpasses both speech and visual analysis by a wide margin. The SAL database was used in the framework of the Humaine Network of Excellence as a common ground for research in everyday, natural emotions.



Similar content being viewed by others
References
Abassi AR, Uno T, Dailey M, Afzulpurkar NV (2007) Towards knowledge-based affective interaction: situational interpretation of affect. In: Paiva A, Prada R, Picard R (eds) Affective computing and intelligent interaction. Springer LNCS, Lisbon, Berlin, pp 452–463
Auberge V, Audibert N, Rilliard A (2004) E-Wiz: a trapper protocol for hunting the expressive speech corpora in lab. In Proceedings of 4th international conference on language resources and evaluation (LREC), pp 179–182
Bachorowski JA (1999) Vocal expression and perception of emotion. Curr Dir Psychol Sci 8(2):53–57
Batliner A, Fischer K, Huber R, Spilker J, Noeth E (2003) How to find trouble in communication. Speech Commun 40:117–143
Bechara A, Damasio A, Damasio H, Anderson S (1994) Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50:7–15
Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors: a closer look. In: Proceedings of international conference on language resources and evaluation, workshop on emotion: corpora for research on emotion and affect, Marrakech, Morocco
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1–2):49–66 (Springer)
Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15):2553–2562 (Elsevier)
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: Proceedings of international conference on multimodal interfaces, Banff, Alberta
Caridakis G, Raouzaiou A, Karpouzis K, Kollias S (2006) Synthesizing gesture expressivity based on real sequences. In: Proceedings of workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC, 2006 Conference. Genoa
Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. LNCS, vol 4868. Springer, Heidelberg
Chen J, Wechsler H (2007) Human computer intelligent interaction using augmented cognition and emotional intelligence. LNCS Volume 4563/2007. Springer, Berlin, pp 205–214
Cowie R (2005) What are people doing when they assign everyday emotion terms? Psychol Inq 16(1):11–18
Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun 40:5–32 (Elsevier)
Cowie R, Douglas-Cowie E, Apolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed) Computational intelligence and applications. World Scientific Engineering Society, pp 109–114
Cowie R, Douglas-Cowie E, Karpouzis K, Caridakis G, Wallace M, Kollias S (2008) Recognition of emotional States in natural human-computer interaction. In: Tzovaras D (ed) Multimodal user interfaces. Springer Berlin, pp 119–153
Cowie R, Douglas-Cowie E, Mckeown G, Gibney C The challenges of dealing with distributed signs of emotion: theory and empirical evidence. In: Proceedings ACII 2009 (published IEEE), vol 1 pp 351–356
Cowie R, Douglas-Cowie E, Savvidou S, Mcmahon E, Sawey M, Schroeder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of ISCA workshop on speech and emotion, Northern Ireland, pp 19–24
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 33–80
Cowie R, Mckeown G (2010) Statistical analysis of data from initial labelled database and recommendations for an economical coding scheme SEMAINE deliverable D6b Downloaded from http://www.semaine-project.eu/. 10/11/2010
Dahlbaeck N, Jonsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: Proceedings of 1st international conference on intelligent user interfaces, Orlando, Florida, pp 193–200
Devillers L, Cowie R, Martin J.-C, Douglas-Cowie E, Abrilian S, Mcrorie M (2006) Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: Proceedings of 5th international conference on language resources and evaluation, Genoa
Dhall, A, Goecke R, Lucey, S, Gedeon T (2011) Acted facial expressions in the wild database. Technical Report TR-CS-11-02, Australian National University
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40:33–60 (Elsevier)
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J-C, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of 2nd international conference on affective computing and intelligent interaction, Lisbon
Duric Z (2002) Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc IEEE 90(7):1272–1289
Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392
Ekman P, Friesen W (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, San Francisco
Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211
Emotionally Rich Man-machine Intelligent System (Ermis) (2008) IST-2000-29319. http://www.image.ntua.gr/ermis. Last retrieved 1 Sept 2008
Eyben F, Wollmer M, Schuller B (2009) openEAR–introducing the Munich open-source emotion and affect recognition toolkit. In: Proceedings of of ACII. Amsterdam, The Netherlands, pp 576–581
Eyben F, Wollmer M, Graves A, Schuller B, Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J Multimodal User Interfaces 3:7–19 (Special Issue on Real-time Affect Analysis and Interpretation: Closing the Loop in Virtual Agents, Springer)
Fragopanagos N, Taylor J (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405 (Elsevier)
Frijda NH (1986) The emotions, studies in emotion and social interaction. Cambridge University Press, New York
Humaine Database. http://www.emotion-research.net/download/pilot-db. Last retrieved 1 September 2008
Humaine IST, Human-Machine Interaction Network on Emotion, 2004–2007. http://www.emotion-research.net. Last retrieved 1 Sept 2008
Ioannou S, Caridakis G, Karpouzis K, Kollias S (2007) Robust feature detection for facial expression recognition. EURASIP J Image Video Process (2)
Ioannou S, Raouzaiou A, Tzouvaras V, Mailis T, Karpouzis K, Kollias S (2005) Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Netw 18(4):423–435. (Special Issue on Emotion: Understanding & Recognition, Elsevier)
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, London, pp 1–8
Lazarus RS (1991) Emotion and adaptation. Oxford University Press, New York
Lazarus RS, Folkman S (1987) Transactional theory and research on emotions and coping. Eur J Pers 1(3):141–169
Martin J-C, Devillers L, Zara A, Maffiolo V, Lechenadec G (2006) The EmoTABOU corpus. Humaine Summer School, Genova, Italy, September 22–28
Mcgilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark. In: Proceedings of the ISCA workshop on speech and emotion
Mertens P (2004) The prosogram: semi-automatic transcription of prosody based on a tonal perception model. In: Bel B, Marlien I (eds) Proceedings of of Speech Prosody, Japan
Ortony A, Collins A, Clore GL (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge
Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal Mach Intell 22(12):1424–1445
Pantic M, Sebe N, Cohn J, Huang T (2005) Affective multimodal human-computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676
Picard R (1997) Affective computing. MIT Press, Cambridge
Raouzaiou A, Tsapatsoulis N, Karpouzis K, Kollias S (2002) Parameterized facial expression synthesis based on MPEG-4. EURASIP J Appl Signal Process 2002(10):1021–1038
Russell JA, Feldman-Barrett L (1999) Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J Pers Soc Psychol 76:805–819
Schaefer A, Zimmermann HG (2007) Recurrent neural networks are universal approximators. Int J Neural Syst 17(4):253– 263
Scherer KR (1987) Toward a dynamic theory of emotion: the component process model of affective states. Geneva Stud Emot Commun 1:1–98
Sebe N, Cohen I, Huang TS (2005) Handbook of pattern recognition and computer vision. World Scientific, River Edge
Semaine IST, The sensitive agent project. http://www.semaine-project.eu. Last retrieved 1 Sept 2008
Valstar M, Gunes H, Pantic M (2007) How to distinguish posed from spontaneous smiles using geometric features. In:. Massaro D, Takeda K, Roy D, Potamianos A (eds) Proceedings of the 9th international conference on multimodal interfaces, ICMI 2007, Nagoya, Aichi, Japan, November 12–15, pp 38–45
Valstar M, Pantic M, Ambadar Z, Cohn J (2006) Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the 8th international conference on multimodal interfaces, ACM, New York, pp 162–170
van Reekum C, Johnstone T, Banse R, Etter A, Wehrle T, Scherer K (2004) Psychophysiological responses to appraisal dimensions in a computer game. Cognit Emot 18(663–688)
Velten E (1998) A laboratory task for induction of mood states. Behav Res Therapy 35:72–82
Wallace M, Ioannou S, Raouzaiou A, Karpouzis K, Kollias S (2006) Dealing with feature uncertainty in facial expression recognition using possibilistic fuzzy rule evaluation. Int J Intell Syst Technol Appl 1(3–4)
Wang N, Marsella S (2006) Evg: an emotion evoking game. In: Proceedings of 6th international conference on intelligent virtual agents. Springer LNCS, pp 282–291
Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):35–36
Whissel CM (1989) The dictionary of affect in language. In: Plutchnik R, Kellerman H (eds) Emotion: theory, research and experience: vol 4, the measurement of emotions. Academic Press, New York
Young JW (1993) Head and face anthropometry of adult U.S. civilians. FAA Civil Aeromedical Institute, 1963–1993
Zeng Z, Pantic M, Roisman G, Huang TS (2007) A survey of affect recognition methods: audio, visual and spontaneous expressions. In: Proceedings of the 9th international conference on multimodal interfaces. ACM, New York, pp 126–133
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karpouzis, K., Caridakis, G., Cowie, R. et al. Induction, recording and recognition of natural emotions from facial expressions and speech prosody. J Multimodal User Interfaces 7, 195–206 (2013). https://doi.org/10.1007/s12193-013-0122-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-013-0122-3