Abstract
This paper presents an approach to bi-modal emotion recognition based on a semi-coupled hidden Markov model (SC-HMM). A simplified state-based bi-modal alignment strategy in SC-HMM is proposed to align the temporal relation of states between audio and visual streams. Based on this strategy, the proposed SC-HMM can alleviate the problem of data sparseness and achieve better statistical dependency between states of audio and visual HMMs in most real world scenarios. For performance evaluation, audio-visual signals with four emotional states (happy, neutral, angry and sad) were collected. Each of the invited seven subjects was asked to utter 30 types of sentences twice to generate emotional speech and facial expression for each emotion. Experimental results show the proposed bi-modal approach outperforms other fusion-based bi-modal emotion recognition methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Mehrabian, A.: Communication without words. Psychol. Today 2(4), 53–56 (1968)
Ambady, N., Rosenthal, R.: Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychol. Bull. 111(2), 256–274 (1992)
Tian, Y.I., Kanade, T., Cohn, J.F.: Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 97–115 (2001)
Wu, C.H., Yeh, J.F., Chuang, Z.J.: Emotion perception and recognition from speech. In: Affective Information Processing, ch. 6, pp. 93–110 (2009)
Wu, C.H., Chuang, Z.J., Lin, Y.C.: Emotion recognition from text using semantic labels and separable mixture models. ACM Transactions on Asian Language Information Processing 5, 165–182 (2006)
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)
Wu, C.H., Liang, W.B.: Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Computing 2(1), 1–12 (2011)
Schuller, B., Muller, R., Hornler, B., Hothker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proc. Ninth ACM Int’l. Conf. Multimodal Interfaces (ICMI 2007), pp. 30–37 (2007)
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: Proc. Int’l. Symposium on Multimedia (ISM 2008), pp. 250–257 (2008)
Song, M., You, M., Li, N., Chen, C.: A robust multimodal approach for emotion recognition. Neurocomputing 71(10-12), 1913–1920 (2008)
Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proc. Int’l. Conf. Computer Vision Pattern Recognition, pp. 994–999 (1997)
Ananthakrishnan, S., Narayanan, S.: An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model. In: Proc. 30th Int’l. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2005), pp. 269–272 (2005)
Nefian, A.V., Liang, L., Pi, X., Liu, X., Mao, C., Murphy, K.: A coupled HMM for audio-visual speech recognition. In: Proc. 27th Int’l. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2002), pp. 2013–2016 (2002)
Xie, L., Liu, Z.Q.: A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40(8), 2325–2340 (2007)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proc. Int’l. Conf. Computer Vision Pattern Recognition, vol. 1, pp. 511–518 (2001)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Boersma, P., Weenink, D.: Praat: doing phonetics by computer (2007), http://www.praat.org/
Brand, M.: Coupled hidden Markov models for modeling interacting processes. MIT Media Lab Perceptual Computing / Learning and Common Sense Technical Report, Boston, MA, pp. 1–28 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, JC., Wu, CH., Wei, WL. (2011). Semi-Coupled Hidden Markov Model with State-Based Alignment Strategy for Audio-Visual Emotion Recognition. In: D’Mello, S., Graesser, A., Schuller, B., Martin, JC. (eds) Affective Computing and Intelligent Interaction. ACII 2011. Lecture Notes in Computer Science, vol 6974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24600-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-24600-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24599-2
Online ISBN: 978-3-642-24600-5
eBook Packages: Computer ScienceComputer Science (R0)