Skip to main content

Audio-Visual Spontaneous Emotion Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4451))

Abstract

Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting—the Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression are at the same coarse affective states, positive and negative emotion sequences are labeled according to Facial Action Coding System. Facial texture in visual channel and prosody in audio channel are integrated in the framework of Adaboost multi-stream hidden Markov model (AdaMHMM) in which the Adaboost learning scheme is used to build component HMM fusion. Our approach is evaluated in AAI spontaneous emotion recognition experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)

    Google Scholar 

  2. Litman, D.J., Forbes-Riley, K.: Predicting Student Emotions in Computer-Human Tutoring Dialogues. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL) (July 2004)

    Google Scholar 

  3. Kapoor, A., Picard, R.W.: Multimodal Affect Recognition in Learning Environments. In: ACM Multimedia, pp. 677–682 (2005)

    Google Scholar 

  4. Viola, P.: Robust Real-Time Face Detection. Int. Journal of Computer Vision. 57(2), 137–154 (2004)

    Article  Google Scholar 

  5. Polzin, S.T., Waibel, A.: Pronunciation Variations in Emotional Speech. In: Proceedings of the ESCA Workshop, pp. 103–108 (1999)

    Google Scholar 

  6. Athanaselis, T., et al.: ASR for Emotional Speech: Clarifying the Issues and Enhancing Performance. Neural Networks 18, 437–444 (2005)

    Article  Google Scholar 

  7. Steeneken, H.J.M., Hansen, J.H.L.: Speech under stress conditions: Overview of the effect of speech production and on system performance. In: Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp. 2079–2082 (1999)

    Google Scholar 

  8. Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band Speech Recognition in noisy environments. In: ICASSP, pp. 641–644 (1998)

    Google Scholar 

  9. Garg, A., et al.: Frame-dependent multi-stream reliability indicators for audio-visual speech recognition. In: ICASSP (2003)

    Google Scholar 

  10. Ekman, P., Friesen, W.V., Ellsworth, P.: Emotion in the Human Face. Pergamon Press, Elmsford (1972)

    Google Scholar 

  11. Izard, C.: The face of Emotion. Appleton-Century-Crofts, New York (1971)

    Google Scholar 

  12. Scherer, K.R.: Feelings integrate the central representation of appraisal-driven response organization in emotion. In: Manstead, A.S.R., Frijda, N.H., Fischer, A.H. (eds.) Feelings and emotions, The Amsterdam symposium, pp. 136–157. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  13. Ekman, P., Friensen, W.V., Hager, J.: Facial Action Unit System. A Human Face (2002)

    Google Scholar 

  14. Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using Facial Action Coding System, 2nd edn. Oxford University Express, Oxford (2005)

    Google Scholar 

  15. Russell, J.A., Bachorowski, J.A., Fernandez-Dols, J.: Facial and Vocal Expressions of Emotion. Annual Review Psychology 54, 329–349 (2003)

    Article  Google Scholar 

  16. Ekman, P., Oster, H.: Facial Expressions of Emotion. Annual Review Psychology 30, 527–554 (1979)

    Article  Google Scholar 

  17. Roisman, G.I., Tsai, J.L., Chiang, K.S.: The Emotional Integration of Childhood Experience: Physiological, Facial Expressive, and Self-reported Emotional Response During the Adult Attachment Interview. Developmental Psychology 40(5), 776–789 (2004)

    Article  Google Scholar 

  18. Cohn, J.F., Tronick, E.: Mother Infant Interaction: the sequence of dyadic states at three, six and nine months. Development Psychology 23, 68–77 (1988)

    Article  Google Scholar 

  19. Fried, E.: The impact of nonverbal communication of facial affect on children’s learning. PhD thesis, Rutgers University, New Brunswick, NJ (1976)

    Google Scholar 

  20. Ekman, P., Matsumoto, D., Friesen, W.: Facial Expression in Affective Disorders. In: Ekman, P., Rosenberg, E.L. (eds.) What the Face Reveals, pp. 429–439 (2005)

    Google Scholar 

  21. Zeng, Z., et al.: One-class classification on spontaneous facial expressions. In: Automatic Face and Gesture Recognition, pp. 281–286 (2006)

    Google Scholar 

  22. Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: ICSLP (1996)

    Google Scholar 

  23. Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18, 407–422 (2005)

    Article  Google Scholar 

  24. Ekman, P., et al.: Ekman-Hager Facial Action Exemplars. Human Interaction Laboratory, University of California, San Francisco (unpublished)

    Google Scholar 

  25. Kanade, T., Cohn, J., Tian, Y.: Comprehensive Database for Facial Expression Analysis. In: Proceeding of International Conference on Face and Gesture Recognition, pp. 46–53 (2000)

    Google Scholar 

  26. Pantic, M., et al.: Web-based database for facial expression analysis. In: Int. Conf. on Multimedia and Expo (2005)

    Google Scholar 

  27. JAFFE: http://www.mic.atr.co.jp/~mlyons/jaffe.html

  28. Chen, L.S.: Joint Processing of Audio-Visual Informa-tion for the Recognition of Emotional Expressions in Human-Computer Interaction. PhD thesis, UIUC (2000)

    Google Scholar 

  29. Cowie, R., Douglas-Cowie, E., Cox, C.: Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neural Networks 18, 371–388 (2005)

    Article  Google Scholar 

  30. Ekman, P., Rosenberg, E. (eds.): What the face reveals. Oxford University Press, Oxford (1997)

    Google Scholar 

  31. Cohn, J.F., Schmidt, K.L.: The timing of Facial Motion in Posed and Spontaneous Smiles. International Journal of Wavelets, Multiresolution and Information Processing 2, 1–12 (2004)

    Article  MathSciNet  Google Scholar 

  32. Valstar, M.F., et al.: Spontaneous vs. Posed Facial Behavior: Automatic Analysis of Brow Actions. In: Int. Conf. on Multimedia Interfaces, pp. 162–170 (2006)

    Google Scholar 

  33. Ekman, P.: Strong Evidence for Universals in Facial Expressions: A Reply to Russell’s Mistaken Critique. Psychological Bulletin 115(2), 268–287 (1994)

    Article  Google Scholar 

  34. Pantic, M., Rothkrantz, L.J.M.: Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)

    Article  Google Scholar 

  35. Pantic, M., et al.: Affective Multimodal Human-Computer Interaction. In: Proc. ACM Int’l Conf. on Multimedia, November 2005, pp. 669–676 (2005)

    Google Scholar 

  36. Sebe, N., et al.: Multimodal Approaches for Emotion Recognition: A Survey. In: Proc. Of SPIE-IS&T Electronic Imaging. SPIE, vol. 5670, pp. 56–67 (2005)

    Google Scholar 

  37. Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine, 32–80 (January 2001)

    Google Scholar 

  38. Chen, L., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Int. Conf. on Multimedia & Expo, pp. 423–426 (2000)

    Google Scholar 

  39. Chen, L., et al.: Multimodal human emotion/ expression recognition. In: Int. Conf. on Automatic Face & Gesture Recognition, pp. 396–401 (1998)

    Google Scholar 

  40. De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Int. Conf. on Automatic Face & Gesture Recognition, pp. 332–335 (2000)

    Google Scholar 

  41. Yoshitomi, Y., et al.: Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In: Proc. ROMAN, pp. 178–183 (2000)

    Google Scholar 

  42. Hoch, S., et al.: Bimodal fusion of emotional data in an automotive environment. In: ICASSP, vol. II, pp. 1085–1088 (2005)

    Google Scholar 

  43. Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: ICASSP, vol. II, pp. 1125–1128 (2005)

    Google Scholar 

  44. Zeng, Z., et al.: Training Combination Strategy of Multi-stream Fused Hidden Markov Model for Audio-visual Affect Recognition. In: Proc. ACM Int’l Conf. on Multimedia, pp. 65–68 (2005)

    Google Scholar 

  45. Zeng, Z., et al.: Audio-visual Affect Recognition through Multi-stream Fused HMM for HCI. In: Int. Conf. Computer Vision and Pattern Recognition, pp. 967–972 (2005)

    Google Scholar 

  46. Zeng, Z., et al.: Multi-stream Confidence Analysis for Audio-Visual Affect Recognition. In: Int. Conf. on Affective Computing and Intelligent Interaction, pp. 946–971 (2005)

    Google Scholar 

  47. Zeng, Z., et al.: Audio-visual Affect Recognition in Activation-evaluation Space. In: Int. Conf. on Multimedia & Expo, pp. 828–831 (2005)

    Google Scholar 

  48. Zeng, Z., et al.: Audio-visual Affect Recognition. IEEE Transactions on Multimedia, in press (2007)

    Google Scholar 

  49. Song, M., et al.: Audio-visual based emotion recognition—A new approach. In: Int. Conf. Computer Vision and Pattern Recognition, pp. 1020–1025 (2004)

    Google Scholar 

  50. Busso, C., et al.: Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. In: Int. Conf. Multimodal Interfaces, pp. 205–211 (2004)

    Google Scholar 

  51. Hoch, S., et al.: Bimodal fusion of emotional data in an automotive environment. In: ICASSP, vol. II, pp. 1085–1088 (2005)

    Google Scholar 

  52. Wang, Y., Guan, L.: Recognizing human emotion from audiovisual information. In: ICASSP, vol. II, pp. 1125–1128 (2005)

    Google Scholar 

  53. Go, H.J., et al.: Emotion recognition from facial image and speech signal. In: Int. Conf. of the Society of Instrument and Control Engineers, pp. 2890–2895 (2003)

    Google Scholar 

  54. Bartlett, M.S., et al.: Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In: IEEE CVPR’05 (2005)

    Google Scholar 

  55. Sebe, N., et al.: Authentic Facial Expression Analysis. In: Int. Conf. on Automatic Face and Gesture Recognition (2004)

    Google Scholar 

  56. Zeng, Z., et al.: Spontaneous Emotional Facial Expression Detection. Journal of Multimedia 1(5), 1–8 (2006)

    Article  Google Scholar 

  57. Valstar, M.F., et al.: Spontaneous vs. Posed Facial Behavior: Automatic Analysis of Brow Actions. In: Int. Conf. on Multimodal Interfaces, pp. 162–170 (2006)

    Google Scholar 

  58. Cohn, J.F., et al.: Automatic Analysis and recognition of brow actions and head motion in spontaneous facial behavior. In: Int. Conf. on Systems, Man & Cybernetics, vol. 1, pp. 610–616 (2004)

    Google Scholar 

  59. Litman, D.J., Forbes-Riley, K.: Predicting Student Emotions in Computer-Human Tutoring Dialogues. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL) (July 2004)

    Google Scholar 

  60. Batliner, A., et al.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)

    Article  MATH  Google Scholar 

  61. Neiberg, D., Elenius, K., Laskowski, K.: Emotion Recognition in Spontaneous Speech Using GMM. In: Int. Conf. on Spoken Language Processing, pp. 809–812 (2006)

    Google Scholar 

  62. Ang, J., et al.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: ICSLP (2002)

    Google Scholar 

  63. Fragopanagos, F., Taylor, J.G.: Emotion recognition in human-computer interaction. Neural Networks 18, 389–405 (2005)

    Article  Google Scholar 

  64. Garidakis, G., et al.: Modeling Naturalistic Affective States via Facial and Vocal Expression Recognition. In: Int. Conf. on Multimodal Interfaces, pp. 146–154 (2006)

    Google Scholar 

  65. Cowie, R., et al.: ’Feeltrace’: an instrument for recording perceived emotion in real time. In: Proceedings of the ISCA Workshop on Speech and Emotion, pp. 19–24 (2000)

    Google Scholar 

  66. Maat, L., Pantic, M.: Gaze-X: Adaptive Affective Multimodal Interface for Single-User Office Scenarios. In: Int. Conf. on Multimodal Interfaces, pp. 171–178 (2006)

    Google Scholar 

  67. Lanitis, A., Taylor, C., Cootes, T.: A Unified Approach to Coding and Interpreting Face Images. In: Proc. International Conf. on Computer Vision, pp. 368–373 (1995)

    Google Scholar 

  68. Black, M., Yacoob, Y.: Tracking and Recognizing Rigid and Non-rigid Facial Motions Using Local Parametric Models of Image Motion. In: Proc. Int. Conf. on Computer Vision, pp. 374–381 (1995)

    Google Scholar 

  69. Rosenblum, M., Yacoob, Y., Davis, L.: Human Expression Recognition from Motion Using a Radial Basis Function Network Architecture. IEEE Trans. on Neural Network 7(5), 1121–1138 (1996)

    Article  Google Scholar 

  70. Essa, I., Pentland, A.: Coding, Analysis, Interpretation, and Recognition of Facial Expressions. IEEE Trans. On Pattern Analysis and Machine Intelligence 19(7), 757–767 (1997)

    Article  Google Scholar 

  71. Cohen, L., et al.: Facial expression recognition from video sequences: Temporal and static modeling. Computer Vision and Image Understanding 91(1-2), 160–187 (2003)

    Article  Google Scholar 

  72. Tian, Y., Kanade, T., Cohn, J.F.: Recognizing Action Units for Facial Expression Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 97–115 (2001)

    Article  Google Scholar 

  73. Pantic, M., Patras, I.: Dynamics of Facial Expression: Recognition of Facial Actions and Their Temporal Segments from Face Profile Image Sequences. IEEE Transactions on Systems, Man and Cybernetics - Part B 36(2), 433–449 (2006)

    Article  Google Scholar 

  74. Kwon, O.W., et al.: Emotion Recognition by Speech Signals. In: EUROSPEECH (2003)

    Google Scholar 

  75. Polzin, T.: Detecting Verbal and Non-verbal cues in the communication of emotion. PhD thesis, Carnegie Mellon University (1999)

    Google Scholar 

  76. Amir, N., Ron, S.: Toward Automatic Classification of Emotions in Speech. In: Proc. ICSLP, pp. 555–558 (1998)

    Google Scholar 

  77. Dellaert, F., Polzin, T., Waibel, A.: Recognizing Emotion in Speech. In: Proc. ICSLP, pp. 1970–1973 (1996)

    Google Scholar 

  78. Petrushin, V.A.: Emotion Recognition in Speech Signal. In: Proc. ICSLP, pp. 222–225 (2000)

    Google Scholar 

  79. Pantic, M., et al.: Human Computing and Machine Understanding of Human Behavior: A Survey. In: Int. Conf. Multimodal Interfaces, pp. 233–238 (2006)

    Google Scholar 

  80. Huang, D.: Physiological, subjective, and behavioral Responses of Chinese American and European Americans during moments of peak emotional intensity. Honor Bachelor thesis, Psychology, University of Minnesota (1999)

    Google Scholar 

  81. Tao, H., Huang, T.S.: Explanation-based facial motion tracking using a piecewise Bezier volume deformation mode. In: IEEE CVPR’99, vol. 1, pp. 611–617 (1999)

    Google Scholar 

  82. Wen, Z., Huang, T.: Capturing Subtle Facial Motions in 3D Face Tracking. In: Intl. Conf. on Computer Vision (ICCV), pp. 1343–1350 (2003)

    Google Scholar 

  83. He, X., et al.: Learning a Locality Preserving Subspace for Visual Recognition. In: Int. Conf. on Computer Vision (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Thomas S. Huang Anton Nijholt Maja Pantic Alex Pentland

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Zeng, Z., Hu, Y., Roisman, G.I., Wen, Z., Fu, Y., Huang, T.S. (2007). Audio-Visual Spontaneous Emotion Recognition. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72348-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72346-2

  • Online ISBN: 978-3-540-72348-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics