Semi-Coupled Hidden Markov Model with State-Based Alignment Strategy for Audio-Visual Emotion Recognition

Lin, Jen-Chun; Wu, Chung-Hsien; Wei, Wen-Li

doi:10.1007/978-3-642-24600-5_22

Jen-Chun Lin¹⁹,
Chung-Hsien Wu¹⁹ &
Wen-Li Wei¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6974))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

4450 Accesses
3 Citations

Abstract

This paper presents an approach to bi-modal emotion recognition based on a semi-coupled hidden Markov model (SC-HMM). A simplified state-based bi-modal alignment strategy in SC-HMM is proposed to align the temporal relation of states between audio and visual streams. Based on this strategy, the proposed SC-HMM can alleviate the problem of data sparseness and achieve better statistical dependency between states of audio and visual HMMs in most real world scenarios. For performance evaluation, audio-visual signals with four emotional states (happy, neutral, angry and sad) were collected. Each of the invited seven subjects was asked to utter 30 types of sentences twice to generate emotional speech and facial expression for each emotion. Experimental results show the proposed bi-modal approach outperforms other fusion-based bi-modal emotion recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Book Google Scholar
Mehrabian, A.: Communication without words. Psychol. Today 2(4), 53–56 (1968)
Google Scholar
Ambady, N., Rosenthal, R.: Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychol. Bull. 111(2), 256–274 (1992)
Article Google Scholar
Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 97–115 (2001)
Article Google Scholar
Wu, C.H., Yeh, J.F., Chuang, Z.J.: Emotion perception and recognition from speech. In: Affective Information Processing, ch. 6, pp. 93–110 (2009)
Google Scholar
Wu, C.H., Chuang, Z.J., Lin, Y.C.: Emotion recognition from text using semantic labels and separable mixture models. ACM Transactions on Asian Language Information Processing 5, 165–182 (2006)
Article Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Article Google Scholar
Wu, C.H., Liang, W.B.: Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Computing 2(1), 1–12 (2011)
Article Google Scholar
Schuller, B., Muller, R., Hornler, B., Hothker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proc. Ninth ACM Int’l. Conf. Multimodal Interfaces (ICMI 2007), pp. 30–37 (2007)
Google Scholar
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: Proc. Int’l. Symposium on Multimedia (ISM 2008), pp. 250–257 (2008)
Google Scholar
Song, M., You, M., Li, N., Chen, C.: A robust multimodal approach for emotion recognition. Neurocomputing 71(10-12), 1913–1920 (2008)
Article Google Scholar
Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proc. Int’l. Conf. Computer Vision Pattern Recognition, pp. 994–999 (1997)
Google Scholar
Ananthakrishnan, S., Narayanan, S.: An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model. In: Proc. 30th Int’l. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 269–272 (2005)
Google Scholar
Nefian, A.V., Liang, L., Pi, X., Liu, X., Mao, C., Murphy, K.: A coupled HMM for audio-visual speech recognition. In: Proc. 27th Int’l. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2002), pp. 2013–2016 (2002)
Google Scholar
Xie, L., Liu, Z.Q.: A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40(8), 2325–2340 (2007)
Article MATH Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. Int’l. Conf. Computer Vision Pattern Recognition, vol. 1, pp. 511–518 (2001)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (2007), http://www.praat.org/
Brand, M.: Coupled hidden Markov models for modeling interacting processes. MIT Media Lab Perceptual Computing / Learning and Common Sense Technical Report, Boston, MA, pp. 1–28 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Jen-Chun Lin, Chung-Hsien Wu & Wen-Li Wei

Authors

Jen-Chun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Hsien Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Li Wei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Memphis, 202 Psychology Building, 38152, Memphis, TN, USA
Sidney D’Mello & Arthur Graesser &
Technische Universität München, Arcisstraße 21, 80333, München, Germany
Björn Schuller
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI-CNRS), Bâtiment 508, 91403, Orsay Cedex, France
Jean-Claude Martin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, JC., Wu, CH., Wei, WL. (2011). Semi-Coupled Hidden Markov Model with State-Based Alignment Strategy for Audio-Visual Emotion Recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24600-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-24600-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24599-2
Online ISBN: 978-3-642-24600-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics