Audio-Visual Spontaneous Emotion Recognition

Zeng, Zhihong; Hu, Yuxiao; Roisman, Glenn I.; Wen, Zhen; Fu, Yun; Huang, Thomas S.

doi:10.1007/978-3-540-72348-6_4

Audio-Visual Spontaneous Emotion Recognition

Zhihong Zeng¹,
Yuxiao Hu¹,
Glenn I. Roisman¹,
Zhen Wen²,
Yun Fu¹ &
…
Thomas S. Huang¹

Conference paper

2533 Accesses
29 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4451))

Abstract

Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting—the Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression are at the same coarse affective states, positive and negative emotion sequences are labeled according to Facial Action Coding System. Facial texture in visual channel and prosody in audio channel are integrated in the framework of Adaboost multi-stream hidden Markov model (AdaMHMM) in which the Adaboost learning scheme is used to build component HMM fusion. Our approach is evaluated in AAI spontaneous emotion recognition experiments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Google Scholar
Litman, D.J., Forbes-Riley, K.: Predicting Student Emotions in Computer-Human Tutoring Dialogues. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL) (July 2004)
Google Scholar
Kapoor, A., Picard, R.W.: Multimodal Affect Recognition in Learning Environments. In: ACM Multimedia, pp. 677–682 (2005)
Google Scholar
Viola, P.: Robust Real-Time Face Detection. Int. Journal of Computer Vision. 57(2), 137–154 (2004)
Article Google Scholar
Polzin, S.T., Waibel, A.: Pronunciation Variations in Emotional Speech. In: Proceedings of the ESCA Workshop, pp. 103–108 (1999)
Google Scholar
Athanaselis, T., et al.: ASR for Emotional Speech: Clarifying the Issues and Enhancing Performance. Neural Networks 18, 437–444 (2005)
Article Google Scholar
Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: Overview of the effect of speech production and on system performance. In: Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2079–2082 (1999)
Google Scholar
Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band Speech Recognition in noisy environments. In: ICASSP, pp. 641–644 (1998)
Google Scholar
Garg, A., et al.: Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. In: ICASSP (2003)
Google Scholar
Ekman, P., Friesen, W.V., Ellsworth, P.: Emotion in the Human Face. Pergamon Press, Elmsford (1972)
Google Scholar
Izard, C.: The face of Emotion. Appleton-Century-Crofts, New York (1971)
Google Scholar
Scherer, K.R.: Feelings integrate the central representation of appraisal-driven response organization in emotion. In: Manstead, A.S.R., Frijda, N.H., Fischer, A.H. (eds.) Feelings and emotions, The Amsterdam symposium, pp. 136–157. Cambridge University Press, Cambridge (2004)
Google Scholar
Ekman, P., Friensen, W.V., Hager, J.: Facial Action Unit System. A Human Face (2002)
Google Scholar
Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using Facial Action Coding System, 2nd edn. Oxford University Express, Oxford (2005)
Google Scholar
Russell, J.A., Bachorowski, J.A., Fernandez-Dols, J.: Facial and Vocal Expressions of Emotion. Annual Review Psychology 54, 329–349 (2003)
Article Google Scholar
Ekman, P., Oster, H.: Facial Expressions of Emotion. Annual Review Psychology 30, 527–554 (1979)
Article Google Scholar
Roisman, G.I., Tsai, J.L., Chiang, K.S.: The Emotional Integration of Childhood Experience: Physiological, Facial Expressive, and Self-reported Emotional Response During the Adult Attachment Interview. Developmental Psychology 40(5), 776–789 (2004)
Article Google Scholar
Cohn, J.F., Tronick, E.: Mother Infant Interaction: the sequence of dyadic states at three, six and nine months. Development Psychology 23, 68–77 (1988)
Article Google Scholar
Fried, E.: The impact of nonverbal communication of facial affect on children’s learning. PhD thesis, Rutgers University, New Brunswick, NJ (1976)
Google Scholar
Ekman, P., Matsumoto, D., Friesen, W.: Facial Expression in Affective Disorders. In: Ekman, P., Rosenberg, E.L. (eds.) What the Face Reveals, pp. 429–439 (2005)
Google Scholar
Zeng, Z., et al.: One-class classification on spontaneous facial expressions. In: Automatic Face and Gesture Recognition, pp. 281–286 (2006)
Google Scholar
Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: ICSLP (1996)
Google Scholar
Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407–422 (2005)
Article Google Scholar
Ekman, P., et al.: Ekman-Hager Facial Action Exemplars. Human Interaction Laboratory, University of California, San Francisco (unpublished)
Google Scholar
Kanade, T., Cohn, J., Tian, Y.: Comprehensive Database for Facial Expression Analysis. In: Proceeding of International Conference on Face and Gesture Recognition, pp. 46–53 (2000)
Google Scholar
Pantic, M., et al.: Web-based database for facial expression analysis. In: Int. Conf. on Multimedia and Expo (2005)
Google Scholar
JAFFE: http://www.mic.atr.co.jp/~mlyons/jaffe.html
Chen, L.S.: Joint Processing of Audio-Visual Informa-tion for the Recognition of Emotional Expressions in Human-Computer Interaction. PhD thesis, UIUC (2000)
Google Scholar
Cowie, R., Douglas-Cowie, E., Cox, C.: Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neural Networks 18, 371–388 (2005)
Article Google Scholar
Ekman, P., Rosenberg, E. (eds.): What the face reveals. Oxford University Press, Oxford (1997)
Google Scholar
Cohn, J.F., Schmidt, K.L.: The timing of Facial Motion in Posed and Spontaneous Smiles. International Journal of Wavelets, Multiresolution and Information Processing 2, 1–12 (2004)
Article MathSciNet Google Scholar
Valstar, M.F., et al.: Spontaneous vs. Posed Facial Behavior: Automatic Analysis of Brow Actions. In: Int. Conf. on Multimedia Interfaces, pp. 162–170 (2006)
Google Scholar
Ekman, P.: Strong Evidence for Universals in Facial Expressions: A Reply to Russell’s Mistaken Critique. Psychological Bulletin 115(2), 268–287 (1994)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
Article Google Scholar
Pantic, M., et al.: Affective Multimodal Human-Computer Interaction. In: Proc. ACM Int’l Conf. on Multimedia, November 2005, pp. 669–676 (2005)
Google Scholar
Sebe, N., et al.: Multimodal Approaches for Emotion Recognition: A Survey. In: Proc. Of SPIE-IS&T Electronic Imaging. SPIE, vol. 5670, pp. 56–67 (2005)
Google Scholar
Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 32–80 (January 2001)
Google Scholar
Chen, L., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Int. Conf. on Multimedia & Expo, pp. 423–426 (2000)
Google Scholar
Chen, L., et al.: Multimodal human emotion/ expression recognition. In: Int. Conf. on Automatic Face & Gesture Recognition, pp. 396–401 (1998)
Google Scholar
De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Int. Conf. on Automatic Face & Gesture Recognition, pp. 332–335 (2000)
Google Scholar
Yoshitomi, Y., et al.: Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In: Proc. ROMAN, pp. 178–183 (2000)
Google Scholar
Hoch, S., et al.: Bimodal fusion of emotional data in an automotive environment. In: ICASSP, vol. II, pp. 1085–1088 (2005)
Google Scholar
Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: ICASSP, vol. II, pp. 1125–1128 (2005)
Google Scholar
Zeng, Z., et al.: Training Combination Strategy of Multi-stream Fused Hidden Markov Model for Audio-visual Affect Recognition. In: Proc. ACM Int’l Conf. on Multimedia, pp. 65–68 (2005)
Google Scholar
Zeng, Z., et al.: Audio-visual Affect Recognition through Multi-stream Fused HMM for HCI. In: Int. Conf. Computer Vision and Pattern Recognition, pp. 967–972 (2005)
Google Scholar
Zeng, Z., et al.: Multi-stream Confidence Analysis for Audio-Visual Affect Recognition. In: Int. Conf. on Affective Computing and Intelligent Interaction, pp. 946–971 (2005)
Google Scholar
Zeng, Z., et al.: Audio-visual Affect Recognition in Activation-evaluation Space. In: Int. Conf. on Multimedia & Expo, pp. 828–831 (2005)
Google Scholar
Zeng, Z., et al.: Audio-visual Affect Recognition. IEEE Transactions on Multimedia, in press (2007)
Google Scholar
Song, M., et al.: Audio-visual based emotion recognition—A new approach. In: Int. Conf. Computer Vision and Pattern Recognition, pp. 1020–1025 (2004)
Google Scholar
Busso, C., et al.: Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In: Int. Conf. Multimodal Interfaces, pp. 205–211 (2004)
Google Scholar
Hoch, S., et al.: Bimodal fusion of emotional data in an automotive environment. In: ICASSP, vol. II, pp. 1085–1088 (2005)
Google Scholar
Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: ICASSP, vol. II, pp. 1125–1128 (2005)
Google Scholar
Go, H.J., et al.: Emotion recognition from facial image and speech signal. In: Int. Conf. of the Society of Instrument and Control Engineers, pp. 2890–2895 (2003)
Google Scholar
Bartlett, M.S., et al.: Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In: IEEE CVPR’05 (2005)
Google Scholar
Sebe, N., et al.: Authentic Facial Expression Analysis. In: Int. Conf. on Automatic Face and Gesture Recognition (2004)
Google Scholar
Zeng, Z., et al.: Spontaneous Emotional Facial Expression Detection. Journal of Multimedia 1(5), 1–8 (2006)
Article Google Scholar
Valstar, M.F., et al.: Spontaneous vs. Posed Facial Behavior: Automatic Analysis of Brow Actions. In: Int. Conf. on Multimodal Interfaces, pp. 162–170 (2006)
Google Scholar
Cohn, J.F., et al.: Automatic Analysis and recognition of brow actions and head motion in spontaneous facial behavior. In: Int. Conf. on Systems, Man & Cybernetics, vol. 1, pp. 610–616 (2004)
Google Scholar
Litman, D.J., Forbes-Riley, K.: Predicting Student Emotions in Computer-Human Tutoring Dialogues. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL) (July 2004)
Google Scholar
Batliner, A., et al.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)
Article MATH Google Scholar
Neiberg, D., Elenius, K., Laskowski, K.: Emotion Recognition in Spontaneous Speech Using GMM. In: Int. Conf. on Spoken Language Processing, pp. 809–812 (2006)
Google Scholar
Ang, J., et al.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: ICSLP (2002)
Google Scholar
Fragopanagos, F., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Networks 18, 389–405 (2005)
Article Google Scholar
Garidakis, G., et al.: Modeling Naturalistic Affective States via Facial and Vocal Expression Recognition. In: Int. Conf. on Multimodal Interfaces, pp. 146–154 (2006)
Google Scholar
Cowie, R., et al.: ’Feeltrace’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24 (2000)
Google Scholar
Maat, L., Pantic, M.: Gaze-X: Adaptive Affective Multimodal Interface for Single-User Office Scenarios. In: Int. Conf. on Multimodal Interfaces, pp. 171–178 (2006)
Google Scholar
Lanitis, A., Taylor, C., Cootes, T.: A Unified Approach to Coding and Interpreting Face Images. In: Proc. International Conf. on Computer Vision, pp. 368–373 (1995)
Google Scholar
Black, M., Yacoob, Y.: Tracking and Recognizing Rigid and Non-rigid Facial Motions Using Local Parametric Models of Image Motion. In: Proc. Int. Conf. on Computer Vision, pp. 374–381 (1995)
Google Scholar
Rosenblum, M., Yacoob, Y., Davis, L.: Human Expression Recognition from Motion Using a Radial Basis Function Network Architecture. IEEE Trans. on Neural Network 7(5), 1121–1138 (1996)
Article Google Scholar
Essa, I., Pentland, A.: Coding, Analysis, Interpretation, and Recognition of Facial Expressions. IEEE Trans. On Pattern Analysis and Machine Intelligence 19(7), 757–767 (1997)
Article Google Scholar
Cohen, L., et al.: Facial expression recognition from video sequences: Temporal and static modeling. Computer Vision and Image Understanding 91(1-2), 160–187 (2003)
Article Google Scholar
Tian, Y., Kanade, T., Cohn, J.F.: Recognizing Action Units for Facial Expression Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 97–115 (2001)
Article Google Scholar
Pantic, M., Patras, I.: Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments from Face Profile Image Sequences. IEEE Transactions on Systems, Man and Cybernetics - Part B 36(2), 433–449 (2006)
Article Google Scholar
Kwon, O.W., et al.: Emotion Recognition by Speech Signals. In: EUROSPEECH (2003)
Google Scholar
Polzin, T.: Detecting Verbal and Non-verbal cues in the communication of emotion. PhD thesis, Carnegie Mellon University (1999)
Google Scholar
Amir, N., Ron, S.: Toward Automatic Classification of Emotions in Speech. In: Proc. ICSLP, pp. 555–558 (1998)
Google Scholar
Dellaert, F., Polzin, T., Waibel, A.: Recognizing Emotion in Speech. In: Proc. ICSLP, pp. 1970–1973 (1996)
Google Scholar
Petrushin, V.A.: Emotion Recognition in Speech Signal. In: Proc. ICSLP, pp. 222–225 (2000)
Google Scholar
Pantic, M., et al.: Human Computing and Machine Understanding of Human Behavior: A Survey. In: Int. Conf. Multimodal Interfaces, pp. 233–238 (2006)
Google Scholar
Huang, D.: Physiological, subjective, and behavioral Responses of Chinese American and European Americans during moments of peak emotional intensity. Honor Bachelor thesis, Psychology, University of Minnesota (1999)
Google Scholar
Tao, H., Huang, T.S.: Explanation-based facial motion tracking using a piecewise Bezier volume deformation mode. In: IEEE CVPR’99, vol. 1, pp. 611–617 (1999)
Google Scholar
Wen, Z., Huang, T.: Capturing Subtle Facial Motions in 3D Face Tracking. In: Intl. Conf. on Computer Vision (ICCV), pp. 1343–1350 (2003)
Google Scholar
He, X., et al.: Learning a Locality Preserving Subspace for Visual Recognition. In: Int. Conf. on Computer Vision (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, USA
Zhihong Zeng, Yuxiao Hu, Glenn I. Roisman, Yun Fu & Thomas S. Huang
IBM T.J.Watson Research Center, USA
Zhen Wen

Authors

Zhihong Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Glenn I. Roisman
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Yun Fu
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Thomas S. Huang Anton Nijholt Maja Pantic Alex Pentland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, Z., Hu, Y., Roisman, G.I., Wen, Z., Fu, Y., Huang, T.S. (2007). Audio-Visual Spontaneous Emotion Recognition. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-72348-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72346-2
Online ISBN: 978-3-540-72348-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics