Multimodal user’s affective state analysis in naturalistic interaction

Caridakis, George; Karpouzis, Kostas; Wallace, Manolis; Kessous, Loic; Amir, Noam

doi:10.1007/s12193-009-0030-8

Multimodal user’s affective state analysis in naturalistic interaction

Original Paper
Published: 15 December 2009

Volume 3, pages 49–66, (2010)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

George Caridakis¹,
Kostas Karpouzis¹,
Manolis Wallace²,
Loic Kessous³ &
…
Noam Amir⁴

225 Accesses
29 Citations
Explore all metrics

Abstract

Affective and human-centered computing have attracted an abundance of attention during the past years, mainly due to the abundance of environments and applications able to exploit and adapt to multimodal input from the users. The combination of facial expressions with prosody information allows us to capture the users’ emotional state in an unintrusive manner, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences, where input is taken from nearly real world situations, contrary to controlled recording conditions of audiovisual material. Recognition is performed via a recurrent neural network, whose short term memory and approximation capabilities cater for modeling dynamic events in facial and prosodic expressivity. This approach also differs from existing work in that it models user expressivity using a dimensional representation, instead of detecting discrete ‘universal emotions’, which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels. Results show that in turns lasting more than a few frames, recognition rates rise to 98%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

A review on the long short-term memory model

Article 13 May 2020

References

Ai H, Litman D, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of interspeech ICSLP, Pittsburgh, PA
Ambady A, Rosenthal R (1992) Thin slices of expressive B predictors of interpersonal consequences: a meta-analysis. Psychol Bull 111(2):256–274
Article Google Scholar
Ang J, Dhilon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody based automatic detection of annoyance and frustration in human computer dialog. In: Proc of ICSLP, pp 2037–2040
Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin K, Solomon P, Theobald BJ (2007) The painful face: pain expression recognition using active appearance models. In: Proc ninth ACM int’l conf multimodal interfaces (ICMI’07), pp 9–14
Bartlett MS, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. In: Proc IEEE int’l conf computer vision and pattern recognition (CVPR’05), pp 568–573
Batliner A, Fischer K, Huber R, Spilker J, Noeth E (2003) How to find trouble in communication. Speech Commun, 40:117–143
Article MATH Google Scholar
Batliner A, Huber R, Niemann H, Noeth E, Spilker J, Fischer K (2000) The recognition of emotion. In: Wahlster W: Verbmobil: foundations of speech-to-speech translations. Springer, New York, pp 122–130
Google Scholar
Bertolami R, Bunke H Early feature stream integration versus decision level combination in a multiple classifier system for text line recognition. In: 18th international conference on pattern recognition (ICPR’06)
Busso C et al. (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proc sixth ACM int’l conf multimodal interfaces (ICMI’04), pp 205–211
Caridakis G, Malatesta L, Kessous L, Amir N, Paouzaiou A, Karpouzis K (2006) Modeling naturalistic affective states via facial and vocal expression recognition. In: Proc eighth ACM int’l conf multimodal interfaces (ICMI’06), pp 146–154
Cohen I, Sebe N, Garg A, Chen LS, Huang TS (2003) Facial expression recognition from video sequences: temporal and static modeling. Comput Vis Image Underst 91:160–187
Article Google Scholar
Cohen PR (2001) Multimodal interaction: a new focal area for AI. In: IJCAI, pp 1467–1473
Cohen PR, Johnston M, McGee D, Oviatt S, Clow J, Smith I (1998) The efficiency of multimodal interaction: A case study. In: Proceedings of international conference on spoken language processing, ICSLP’98, Australia
Cohn JF, Schmidt KL (2004) The timing of facial motion in posed and spontaneous smiles. Int J Wavelets Multiresolut Inf Process 2:1–12
Article MathSciNet Google Scholar
Cohn JF (2006) Foundations of human computing: facial expression and emotion. In: Proc eighth ACM int’l conf multimodal interfaces (ICMI’06), pp 233–238
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE PAMI 23(6):681–685
Google Scholar
Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun 40:5–32
Article MATH Google Scholar
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schroder M (2000) ‘FeelTrace’: an instrument for recording perceived emotion in real time. In: Proceedings of ISCA workshop on speech and emotion, pp 19–24
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. In: IEEE Signal Process Mag, 32–80
De Silva LC, Ng PC (2000) Bimodal emotion recognition. In: Proc face and gesture recognition conf, pp 332–335
Devillers L, Vidrascu L (2007) Real-life emotion recognition human-human call center data with acoustic and lexical cues. In: Müller C, Schötz S (eds) Speaker characterization. Springer, Berlin (to appear)
Duric Z, Gray WD, Heishman R, Li F, Rosenfeld A, Schoelles MJ, Schunn C, Wechsler H (2002) Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. In: Proc IEEE, vol 90(7), pp 1272–1289
Ekman P, Friesen WV (1978) The facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press, San Francisco
Google Scholar
Ekman P, Friesen WV (1982) Felt, false, and miserable smiles. J Nonverbal Behav, 6:238–252
Article Google Scholar
Ekman P, Rosenberg EL (2005) What the face reveals: basic and applied studies of spontaneous expression using the facial action coding system, 2nd edn. Oxford University Press, London
Google Scholar
Ekman P (1993) Facial expression and emotion. Am Psychol 48:384–392
Article Google Scholar
Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211
Article Google Scholar
Elman JL (1991) Distributed representations, simple recurrent networks, and grammatical structure. Mach Learn 7:195–224
Google Scholar
Essa IA, Pentland AP (1997) Coding, analysis, interpretation, and recognition of facial expressions. IEEE Trans Pattern Anal Mach Intell 19(7):757–763
Article Google Scholar
Fasel B, Luttin J (2003) Automatic facial expression analysis: survey. Pattern Recogn 36(1):259–275
Article MATH Google Scholar
Fragopanagos N, Taylor JG (2005) Emotion recognition in human computer interaction. Neural Netw 18:389–405
Article Google Scholar
Frank MG, Ekman P (1993) Not all smiles are created equal: differences between enjoyment and other smiles. Humor: Int J Res Humor 6:9–26
Article Google Scholar
Go HJ, Kwak KC, Lee DJ, Chun MG (2003) Emotion recognition from facial image and speech signal. In: Proc int’l conf soc of instrument and control engineers, pp 2890–2895
Gunes H, Piccardi M (2005) Fusing face and body gesture for machine recognition of emotions. In: 2005 IEEE international workshop on robots and human interactive communication, pp 306–311
Hammer A, Tino P (2003) Recurrent neural networks with small weights implement definite memory machines. Neural Comput 15(8):1897–1929
Article MATH Google Scholar
Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New York
MATH Google Scholar
Hoch S, Althoff F, McGlaun G, Rigoll G (2005) Bimodal fusion of emotional data in an automotive environment. In: Proc 30th int’l conf acoustics, speech, and signal processing (ICASSP ’05), vol II, pp 1085–1088
http://emotion-research.net/toolbox/toolboxdatabase Humaine
Ioannou S, Raouzaiou A, Tzouvaras V, Mailis T, Karpouzis K, Kollias S (2005) Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Netw 18(4):423–435. Special issue on emotion: understanding & recognition
Article Google Scholar
Ioannou S, Caridakis G, Karpouzis K, Kollias S (2007) Robust feature detection for facial expression recognition. EURASIP J Image Video Process 2007(2)
Jaimes A, Sebe N (2005) Multimodal human computer interaction: a survey. In: IEEE international workshop on human computer interaction, ICCV 2005, Beijing, China
Jaimes A (2006) Human-centered multimedia: culture, deployment, and access. IEEE Multimedia Mag 13(1)
Kapoor A, Picard RW, Ivanov Y (2004) Probabilistic combination of multiple modalities to detect interest. In: Proc of IEEE ICPR
Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Human-Comput Stud 65(8):724–736
Article Google Scholar
Karpouzis K, Caridakis G, Kessous L, Amir N, Raouzaiou A, Malatesta L, Kollias S (2007) Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In: Huang T, Nijholt A, Pantic M, Pentland A (eds) Lecture notes in artificial intelligence, vol 4451. Springer, Berlin. pp 91–112. Special Volume on AI for Human Computing
Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303
Article Google Scholar
Leung SH, Wang SL, Lau WH (2004) Lip image segmentation using fuzzy clustering incorporating an elliptic shape function. IEEE Trans Image Process 13(1)
Lisetti CL, Nasoz F (2002) MAUI: a multimodal affective user interface. In: Proc 10th ACM int’l conf multimedia (Multimedia ’02), pp 161–170
Littlewort G, Bartlett MS, Fasel I, Susskind J, Movellan J (2006) Dynamics of facial expression extracted automatically from video. Image Vis Comput 24:615–625
Article Google Scholar
Littlewort GC, Bartlett MS, Lee K (2007) Faces of pain: automated measurement of spontaneous facial expressions of genuine and posed pain. In: Proc ninth ACM int’l conf multimodal interfaces (ICMI’07), pp 15–21
Maat L, Pantic M (2006) Gaze-X: adaptive affective multimodal interface for single-user office scenarios. In: Proc eighth ACM int’l conf multimodal interfaces (ICMI’06), pp 171–178
Mehrabian A (1968) Communication without words. Psychol. Today 2(4):53–56
Google Scholar
Mertens P (2004) The prosogram: semi-automatic transcription of prosody based on a tonal perception model. In: Bel B, Marlien I (eds) Proc of speech Prosody, Japan
Neiberg D, Elenius K, Karlsson I, Laskowski K (2006) Emotion recognition in spontaneous speech. In: Proceedings of fonetik 2006, pp 101–104
Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Human-Comput Interact 59(1–2):157–183
Google Scholar
Oviatt S (1999) Ten myths of multimodal interaction. Commun ACM 42(11):74–81
Article Google Scholar
Oviatt S, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of conference on human factors in computing systems CHI’97. ACM, New York, pp 415–422
Chapter Google Scholar
Pal P, Iyer AN, Yantorno RE (2006) Emotion detection from infant facial expressions and cries. In: Proc IEEE int’l conf acoustics, speech and signal processing (ICASSP’06), vol 2, pp 721–724
Pantic M, Patras I (2006) Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans Syst Man Cybern, Part B 36(2):433–449
Article Google Scholar
Pantic M, Rothkrantz LJM (2003) Towards an affect-sensitive multimodal human-computer interaction. Proc IEEE 91(9):1370–1390
Article Google Scholar
Pantic M, Bartlett MS (2007) Machine analysis of facial expressions. In: Delac K, Grgic M (eds) Face recognition, I-Tech Education and Publishing, pp 377–416
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Article Google Scholar
Pantic M, Rothkrantz LJM (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE 91(9):1370–1390
Article Google Scholar
Pantic M, Rothkrantz LJM (2000) Expert system for automatic analysis of facial expressions. Image Vis Comput 18:881–905
Article Google Scholar
Pantic M (2005) Face for interface. In: Pagani M (ed) The encyclopedia of multimedia technology and networking. Idea Group Reference, Hershey, vol 1, pp 308–314
Google Scholar
Pantic M, Sebe N, Cohn JF, Huang T (2005) Affective multi-modal human-computer interaction. In: Proc 13th ACM int’l conf multimedia (Multimedia ’05), pp 669–676
Pentland A (2005) Socially aware computation and communication. Computer 38(3):33–40
Article Google Scholar
Petridis S, Pantic M (2008) Audiovisual discrimination between laughter and speech. In: IEEE int’l conf acoustics, speech, and signal processing (ICASSP), pp 5117–5120
Picard RW (1997) Affective computing. MIT Press, Cambridge
Google Scholar
Picard RW (2000) Towards computers that recognize and respond to user emotion. IBM Syst J 39(3–4):705–719
Google Scholar
Rogozan A (1999) Discriminative learning of visual data for audiovisual speech recognition. Int J Artif Intell Tools 8:43–52
Article Google Scholar
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11:273–294
Article Google Scholar
Russell JA, Bachorowski J, Fernandez-Dols J (2003) Facial and vocal expressions of emotion. Ann Rev Psychol 54:329–349
Article Google Scholar
Samal A, Iyengar PA (1992) Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern Recogn 25(1):65–77
Article Google Scholar
Sander D, Grandjean D, Scherer KR (2005) A system approach to appraisal mechanisms in emotion. Neural Netw 18:317–352
Article Google Scholar
Schaefer M, Zimmermann HG (2006) Recurrent neural networks are universal approximators, ICANN 2006, pp 632–640
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and emotion, pp 637–663. Wiley, New York
Chapter Google Scholar
Schlosberg H (1954) A scale for judgment of facial expressions. J Exp Psychol 29:497–510
Article Google Scholar
Sebe N, Lew MS, Cohen I, Sun Y, Gevers T, Huang TS (2004) Authentic facial expression analysis. In: International conference on automatic face and gesture recognition (FG’04), Seoul, Korea, May 2004, pp 517–522
Sebe N, Cohen I, Huang TS (2005) Multimodal emotion recognition. Handbook of pattern recognition and computer vision. World Scientific, Singapore
Google Scholar
Sebe N, Cohen I, Gevers T, Huang TS (2006) Emotion recognition based on joint visual and audio cues. In: Proc 18th int’l conf pattern recognition (ICPR’06), pp 1136–1139
Song M, Bu J, Chen C, Li N (2004) Audio-visual-based emotion recognition: a new approach. In: Proc int’l conf computer vision and pattern recognition (CVPR’04), pp 1020–1025
Teissier P, Robert-Ribes J, Schwartz JL (1999) Comparing models for audiovisual fusion in a noisy-vowel recognition task. IEEE Trans Speech Audio Process 7:629–642
Article Google Scholar
Tekalp M, Ostermann J (2000) Face and 2-D mesh animation in MPEG-4. Signal Process Image Commun 15:387–421
Article Google Scholar
Tian YL, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans PAMI 23(2)
Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis. In: Li SZ, Jain AK (eds) Handbook of face recognition, pp 247–276. Springer, Berlin
Chapter Google Scholar
Tomasi C, Kanade T (1991) Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University, April 1991
Valstar MF, Gunes H, Pantic M (2007) How to distinguish posed from spontaneous smiles using geometric features. In: Proc ninth ACM int’l conf multimodal interfaces (ICMI’07), pp 38–45
Valstar M, Pantic M, Ambadar Z, Cohn JF (2006) Spontaneous versus posed facial behavior: automatic analysis of Brow actions. In: Proc eight int’l conf multimodal interfaces (ICMI’06), pp 162–170
Wang Y, Guan L (2005) Recognizing human emotion from audiovisual information. In: Proc int’l conf acoustics, speech, and signal processing (ICASSP ’05), pp 1125–1128
Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–35
Article Google Scholar
Whissel CM (1989) The dictionary of affect in language. In: Plutchnik R, Kellerman H (eds) Emotion: theory, research and experience: the measurement of emotions. Academic Press, New York, vol 4, pp 113–131
Google Scholar
Williams A (2002) Facial expression of pain: an evolutionary account. Behav Brain Sci 25(4):439–488
Google Scholar
Wu L, Oviatt SL, Cohen PR (1999) Multimodal integration—a statistical view. IEEE Trans Multimedia 1(4)
Young JW (1993) Head and face anthropometry of adult U.S. civilians. FAA Civil Aeromedical Institute, 1963–1993 (final report 1993)
Zeng Z, Tu J, Liu M, Huang TS, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimedia 9(2):424–428
Article Google Scholar
Zeng Z, Tu J, Liu M, Huang TS, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimedia 9(2)
Zeng Z, Hu Y, Liu M, Fu Y, Huang TS (2006) Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition. In: Proc 14th ACM int’l conf multimedia (Multimedia’06), pp 65–68
Zeng Z, Hu Y, Roisman GI, Wen Z, Fu Y, Huang TS (2007) Audio-visual spontaneous emotion recognition. In: Huang TS, Nijholt A, Pantic M, Pentland A (eds) Artificial intelligence for human computing, pp 72–90. Springer, Berlin
Chapter Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2007) A survey of affect recognition methods: audio, visual, and spontaneous expressions. In: Proc ninth ACM int’l conf multimodal interfaces (ICMI’07), pp 126–133
Zeng Z, Tu J, Liu M, Zhang T, Rizzolo N, Zhang Z, Huang TS, Roth D, Levinson S (2004) Bimodal HCI-related emotion recognition. In: Proc sixth ACM int’l conf multimodal interfaces (ICMI’04), pp 137–143
Zeng Z, Tu J, Pianfetti P, Liu M, Zhang T, Zhang Z, Huang TS, Levinson S (2005) Audio-visual affect recognition through multi-stream fused HMM for HCI. In: Proc IEEE int’l conf computer vision and pattern recognition (CVPR’05), pp 967–972
Zeng Z, Tu J, Liu M, Huang TS, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimedia 9(2):424–428
Article Google Scholar

Download references

Author information

Authors and Affiliations

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, Athens, Greece
George Caridakis & Kostas Karpouzis
Department of Computer Science, University of Peloponnese, Tripolis, Greece
Manolis Wallace
Institut des Systèmes Intelligents et de Robotique, Paris, France
Loic Kessous
Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Noam Amir

Authors

George Caridakis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Karpouzis
View author publications
You can also search for this author in PubMed Google Scholar
Manolis Wallace
View author publications
You can also search for this author in PubMed Google Scholar
Loic Kessous
View author publications
You can also search for this author in PubMed Google Scholar
Noam Amir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Caridakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caridakis, G., Karpouzis, K., Wallace, M. et al. Multimodal user’s affective state analysis in naturalistic interaction. J Multimodal User Interfaces 3, 49–66 (2010). https://doi.org/10.1007/s12193-009-0030-8

Download citation

Received: 30 March 2009
Accepted: 09 November 2009
Published: 15 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s12193-009-0030-8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal user’s affective state analysis in naturalistic interaction

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Multimodal user’s affective state analysis in naturalistic interaction

Abstract

Access this article

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

A review on the long short-term memory model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation