Induction, recording and recognition of natural emotions from facial expressions and speech prosody

Karpouzis, Kostas; Caridakis, George; Cowie, Roddy; Douglas-Cowie, Ellen

doi:10.1007/s12193-013-0122-3

Induction, recording and recognition of natural emotions from facial expressions and speech prosody

Original Paper
Published: 26 April 2013

Volume 7, pages 195–206, (2013)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Kostas Karpouzis¹,
George Caridakis¹,
Roddy Cowie² &
…
Ellen Douglas-Cowie²

512 Accesses
Explore all metrics

Abstract

Recording and annotating a multimodal database of natural expressivity is a task that requires careful planning and implementation, before even starting to apply feature extraction and recognition algorithms. Requirements and characteristics of such databases are inherently different than those of acted behaviour, both in terms of unconstrained expressivity of the human participants, and in terms of the expressed emotions. In this paper, we describe a method to induce, record and annotate natural emotions, which was used to provide multimodal data for dynamic emotion recognition from facial expressions and speech prosody; results from a dynamic recognition algorithm, based on recurrent neural networks, indicate that multimodal processing surpasses both speech and visual analysis by a wide margin. The SAL database was used in the framework of the Humaine Network of Excellence as a common ground for research in everyday, natural emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Database of Emotional Speech, Video and Gestures

Recognition and Classification of Facial Expressions Using Artificial Neural Networks

Emotion Recognition from Speech

References

Abassi AR, Uno T, Dailey M, Afzulpurkar NV (2007) Towards knowledge-based affective interaction: situational interpretation of affect. In: Paiva A, Prada R, Picard R (eds) Affective computing and intelligent interaction. Springer LNCS, Lisbon, Berlin, pp 452–463
Auberge V, Audibert N, Rilliard A (2004) E-Wiz: a trapper protocol for hunting the expressive speech corpora in lab. In Proceedings of 4th international conference on language resources and evaluation (LREC), pp 179–182
Bachorowski JA (1999) Vocal expression and perception of emotion. Curr Dir Psychol Sci 8(2):53–57
Article Google Scholar
Batliner A, Fischer K, Huber R, Spilker J, Noeth E (2003) How to find trouble in communication. Speech Commun 40:117–143
Google Scholar
Bechara A, Damasio A, Damasio H, Anderson S (1994) Insensitivity to future consequences following damage to human prefrontal cortex. Cognition 50:7–15
Article Google Scholar
Busso C, Narayanan S (2008) Recording audio-visual emotional databases from actors: a closer look. In: Proceedings of international conference on language resources and evaluation, workshop on emotion: corpora for research on emotion and affect, Marrakech, Morocco
Caridakis G, Karpouzis K, Wallace M, Kessous L, Amir N (2010) Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3(1–2):49–66 (Springer)
Google Scholar
Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15):2553–2562 (Elsevier)
Google Scholar
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expressions recognition. In: Proceedings of international conference on multimodal interfaces, Banff, Alberta
Caridakis G, Raouzaiou A, Karpouzis K, Kollias S (2006) Synthesizing gesture expressivity based on real sequences. In: Proceedings of workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC, 2006 Conference. Genoa
Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction. LNCS, vol 4868. Springer, Heidelberg
Chen J, Wechsler H (2007) Human computer intelligent interaction using augmented cognition and emotional intelligence. LNCS Volume 4563/2007. Springer, Berlin, pp 205–214
Cowie R (2005) What are people doing when they assign everyday emotion terms? Psychol Inq 16(1):11–18
Article Google Scholar
Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun 40:5–32 (Elsevier)
Google Scholar
Cowie R, Douglas-Cowie E, Apolloni B, Taylor J, Romano A, Fellenz W (1999) What a neural net needs to know about emotion words. In: Mastorakis N (ed) Computational intelligence and applications. World Scientific Engineering Society, pp 109–114
Cowie R, Douglas-Cowie E, Karpouzis K, Caridakis G, Wallace M, Kollias S (2008) Recognition of emotional States in natural human-computer interaction. In: Tzovaras D (ed) Multimodal user interfaces. Springer Berlin, pp 119–153
Cowie R, Douglas-Cowie E, Mckeown G, Gibney C The challenges of dealing with distributed signs of emotion: theory and empirical evidence. In: Proceedings ACII 2009 (published IEEE), vol 1 pp 351–356
Cowie R, Douglas-Cowie E, Savvidou S, Mcmahon E, Sawey M, Schroeder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: Proceedings of ISCA workshop on speech and emotion, Northern Ireland, pp 19–24
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 33–80
Cowie R, Mckeown G (2010) Statistical analysis of data from initial labelled database and recommendations for an economical coding scheme SEMAINE deliverable D6b Downloaded from http://www.semaine-project.eu/. 10/11/2010
Dahlbaeck N, Jonsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: Proceedings of 1st international conference on intelligent user interfaces, Orlando, Florida, pp 193–200
Devillers L, Cowie R, Martin J.-C, Douglas-Cowie E, Abrilian S, Mcrorie M (2006) Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches. In: Proceedings of 5th international conference on language resources and evaluation, Genoa
Dhall, A, Goecke R, Lucey, S, Gedeon T (2011) Acted facial expressions in the wild database. Technical Report TR-CS-11-02, Australian National University
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40:33–60 (Elsevier)
Google Scholar
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J-C, Devillers L, Abrilian S, Batliner A, Amir N, Karpouzis K (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Proceedings of 2nd international conference on affective computing and intelligent interaction, Lisbon
Duric Z (2002) Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc IEEE 90(7):1272–1289
Google Scholar
Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392
Article Google Scholar
Ekman P, Friesen W (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, San Francisco
Elman JL (1990) Finding structure in time. Cognit Sci 14:179–211
Article Google Scholar
Emotionally Rich Man-machine Intelligent System (Ermis) (2008) IST-2000-29319. http://www.image.ntua.gr/ermis. Last retrieved 1 Sept 2008
Eyben F, Wollmer M, Schuller B (2009) openEAR–introducing the Munich open-source emotion and affect recognition toolkit. In: Proceedings of of ACII. Amsterdam, The Netherlands, pp 576–581
Eyben F, Wollmer M, Graves A, Schuller B, Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J Multimodal User Interfaces 3:7–19 (Special Issue on Real-time Affect Analysis and Interpretation: Closing the Loop in Virtual Agents, Springer)
Google Scholar
Fragopanagos N, Taylor J (2005) Emotion recognition in human-computer interaction. Neural Netw 18:389–405 (Elsevier)
Google Scholar
Frijda NH (1986) The emotions, studies in emotion and social interaction. Cambridge University Press, New York
Humaine Database. http://www.emotion-research.net/download/pilot-db. Last retrieved 1 September 2008
Humaine IST, Human-Machine Interaction Network on Emotion, 2004–2007. http://www.emotion-research.net. Last retrieved 1 Sept 2008
Ioannou S, Caridakis G, Karpouzis K, Kollias S (2007) Robust feature detection for facial expression recognition. EURASIP J Image Video Process (2)
Ioannou S, Raouzaiou A, Tzouvaras V, Mailis T, Karpouzis K, Kollias S (2005) Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Netw 18(4):423–435. (Special Issue on Emotion: Understanding & Recognition, Elsevier)
Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, London, pp 1–8
Lazarus RS (1991) Emotion and adaptation. Oxford University Press, New York
Google Scholar
Lazarus RS, Folkman S (1987) Transactional theory and research on emotions and coping. Eur J Pers 1(3):141–169
Google Scholar
Martin J-C, Devillers L, Zara A, Maffiolo V, Lechenadec G (2006) The EmoTABOU corpus. Humaine Summer School, Genova, Italy, September 22–28
Mcgilloway S, Cowie R, Douglas-Cowie E, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark. In: Proceedings of the ISCA workshop on speech and emotion
Mertens P (2004) The prosogram: semi-automatic transcription of prosody based on a tonal perception model. In: Bel B, Marlien I (eds) Proceedings of of Speech Prosody, Japan
Ortony A, Collins A, Clore GL (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge
Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal Mach Intell 22(12):1424–1445
Google Scholar
Pantic M, Sebe N, Cohn J, Huang T (2005) Affective multimodal human-computer interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp 669–676
Picard R (1997) Affective computing. MIT Press, Cambridge
Google Scholar
Raouzaiou A, Tsapatsoulis N, Karpouzis K, Kollias S (2002) Parameterized facial expression synthesis based on MPEG-4. EURASIP J Appl Signal Process 2002(10):1021–1038
Article MATH Google Scholar
Russell JA, Feldman-Barrett L (1999) Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. J Pers Soc Psychol 76:805–819
Article Google Scholar
Schaefer A, Zimmermann HG (2007) Recurrent neural networks are universal approximators. Int J Neural Syst 17(4):253– 263
Google Scholar
Scherer KR (1987) Toward a dynamic theory of emotion: the component process model of affective states. Geneva Stud Emot Commun 1:1–98
Google Scholar
Sebe N, Cohen I, Huang TS (2005) Handbook of pattern recognition and computer vision. World Scientific, River Edge
Google Scholar
Semaine IST, The sensitive agent project. http://www.semaine-project.eu. Last retrieved 1 Sept 2008
Valstar M, Gunes H, Pantic M (2007) How to distinguish posed from spontaneous smiles using geometric features. In:. Massaro D, Takeda K, Roy D, Potamianos A (eds) Proceedings of the 9th international conference on multimodal interfaces, ICMI 2007, Nagoya, Aichi, Japan, November 12–15, pp 38–45
Valstar M, Pantic M, Ambadar Z, Cohn J (2006) Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of the 8th international conference on multimodal interfaces, ACM, New York, pp 162–170
van Reekum C, Johnstone T, Banse R, Etter A, Wehrle T, Scherer K (2004) Psychophysiological responses to appraisal dimensions in a computer game. Cognit Emot 18(663–688)
Google Scholar
Velten E (1998) A laboratory task for induction of mood states. Behav Res Therapy 35:72–82
Google Scholar
Wallace M, Ioannou S, Raouzaiou A, Karpouzis K, Kollias S (2006) Dealing with feature uncertainty in facial expression recognition using possibilistic fuzzy rule evaluation. Int J Intell Syst Technol Appl 1(3–4)
Google Scholar
Wang N, Marsella S (2006) Evg: an emotion evoking game. In: Proceedings of 6th international conference on intelligent virtual agents. Springer LNCS, pp 282–291
Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):35–36
Whissel CM (1989) The dictionary of affect in language. In: Plutchnik R, Kellerman H (eds) Emotion: theory, research and experience: vol 4, the measurement of emotions. Academic Press, New York
Young JW (1993) Head and face anthropometry of adult U.S. civilians. FAA Civil Aeromedical Institute, 1963–1993
Zeng Z, Pantic M, Roisman G, Huang TS (2007) A survey of affect recognition methods: audio, visual and spontaneous expressions. In: Proceedings of the 9th international conference on multimodal interfaces. ACM, New York, pp 126–133
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Google Scholar

Download references

Author information

Authors and Affiliations

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, Athens, Greece
Kostas Karpouzis & George Caridakis
School of Psychology, Queen’s University, Belfast, Belfast, UK
Roddy Cowie & Ellen Douglas-Cowie

Authors

Kostas Karpouzis
View author publications
You can also search for this author inPubMed Google Scholar
George Caridakis
View author publications
You can also search for this author inPubMed Google Scholar
Roddy Cowie
View author publications
You can also search for this author inPubMed Google Scholar
Ellen Douglas-Cowie
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to George Caridakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karpouzis, K., Caridakis, G., Cowie, R. et al. Induction, recording and recognition of natural emotions from facial expressions and speech prosody. J Multimodal User Interfaces 7, 195–206 (2013). https://doi.org/10.1007/s12193-013-0122-3

Download citation

Received: 13 April 2012
Accepted: 28 March 2013
Published: 26 April 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s12193-013-0122-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Induction, recording and recognition of natural emotions from facial expressions and speech prosody

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Database of Emotional Speech, Video and Gestures

Recognition and Classification of Facial Expressions Using Artificial Neural Networks

Emotion Recognition from Speech

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now