Abstract
Automatic recognition of human affective states is still a largely unexplored and challenging topic. Even more issues arise when dealing with variable quality of the inputs or aiming for real-time, unconstrained, and person independent scenarios. In this paper, we explore audio-visual multimodal emotion recognition. We present SAMMI, a framework designed to extract real-time emotion appraisals from non-prototypical, person independent, facial expressions and vocal prosody. Different probabilistic method for fusion are compared and evaluated with a novel fusion technique called NNET. Results shows that NNET can improve the recognition score (CR + ) of about 19% and the mean average precision of about 30% with respect to the best unimodal system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Picard, R.: Affective Computing. MIT Press, Cambridge (1997)
Benmokhtar, R., Huet, B.: Neural network combining classifier based on Dempster-Shafer theory for semantic indexing in video content. In: Cham, T.-J., Cai, J., Dorai, C., Rajan, D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4351, pp. 196–205. Springer, Heidelberg (2006)
Lisetti, C., Nasoz, F.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on ASP 11, 1672–1687 (2004)
Villon, O., Lisetti, C.L.: Toward Building Adaptive User’s Psycho-Physiological Maps of Emotions using Bio-Sensors. In: Proceedings of KI (2006)
Mase, K.: Recognition of facial expression from optical flow. Proceedings of IEICE Transactions E74, 3474–3483 (1991)
Essa, I.A., Pentland, A.P.: Coding, Analysis, Interpretation, and Recognition of Facial Expressions. IEEE Transactions PAMI 19(7), 757–763 (1997)
Cohen, I., Sebe, N., Garg, A., Lew, S., Huang, T.: Facial expression recognition from video sequences. In: Proceedings of ICME, pp. 121–124 (2002)
Pantic, M., Rothkrantz, L.: Toward an Affect-Sensitive Multimodal Human-Computer Interaction. Proceedings of IEEE 91, 1370–1390 (2003)
Gunes, H., Piccardi, M.: Bi-modal emotion recognition from expressive face and body gestures. Journal NCA 30(4), 1334–1345 (2007)
Noble, J.: Spoken Emotion Recognition with Support Vector Machines. PhD Thesis (2003)
Zeng, Z., Hu, Y., Liu, M., Fu, Y., Huang, T.S.: Training combination strategy of multi-stream fused hidden Markov model for audio-visual affect recognition. In: ACM MM, pp. 65–68 (2006)
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee., C., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of ICMI, pp. 205–211 (2004)
Audio-Visual Affect Recognition through Multi-Stream Fused HMM for HCI. In: CVPR. vol. 2 (2005)
Paleari, M., Huet, B., Duffy, B.: SAMMI, Semantic Affect-enhanced MultiMedia Indexing. In: SAMT (2007)
Paleari, M., Huet, B.: Toward Emotion Indexing of Multimedia Excerpts. In: CBMI (2008)
Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE05 Audio-Visual Emotion Database. In: Proceedings of ICDEW (2006)
Galmar, E., Huet, B.: Analysis of Vector Space Model and Spatiotemporal Segmentation for Video Indexing and Retrieval. In: ACM CIVR (2007)
Benmokhtar, R., Huet, B.: Multi-level Fusion for Semantic Video Content Indexing and Retrieval. In: Proceedings of AMR (2007)
IntelCorporation: Open Source Computer Vision Library: Reference Manual (November 2006), http://opencvlibrary.sourceforge.net
Vukadinovic, D., Pantic, M.: Fully automatic facial feature point detection using Gabor feature based boosted classifiers. In: Proceedings of IEEE ICSMC, pp. 1692–1698 (2005)
Sohail, A.S.M., Bhattacharya, P.: Detection of Facial Feature Points Using Anthropometric Face Model. In: Proceedings of SPIEMP, vol. 31, pp. 189–200 (2006)
Boersmal, P., Weenink, D.: Praat: doing phonetics by computer (January 2008), http://www.praat.org/
Benmokhtar, R., Huet, B.: Classifier fusion: Combination methods for semantic indexing in video content. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 65–74. Springer, Heidelberg (2006)
Denoeux, T.: An evidence-theoretic neural network classifer. In: Proceedings of IEEE SMC, vol. 31, pp. 712–717 (1995)
Cohen, I., Garg, A., Huang, T.S.: Emotion recognition from facial expressions using multilevel HMM. In: NIPS (2000)
Benmokhtar, R., Huet, B.: Low-level feature fusion models for soccer scene classification. In: 2008 IEEE ICME (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paleari, M., Benmokhtar, R., Huet, B. (2009). Evidence Theory-Based Multimodal Emotion Recognition. In: Huet, B., Smeaton, A., Mayer-Patel, K., Avrithis, Y. (eds) Advances in Multimedia Modeling . MMM 2009. Lecture Notes in Computer Science, vol 5371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-92892-8_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-92892-8_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-92891-1
Online ISBN: 978-3-540-92892-8
eBook Packages: Computer ScienceComputer Science (R0)