Abstract
Effective analysis and recognition of human emotional behavior are important for achieving efficient and intelligent human computer interaction. This paper presents an approach for audiovisual based multimodal emotion recognition. The proposed solution integrates the audio and visual information by fusing the kernel matrices of respective channels through algebraic operations, followed by dimensionality reduction techniques to map the original disparate features to a nonlinearly transformed joint subspace. A hidden Markov model is employed for characterizing the statistical dependence across successive frames, and identifying the inherent temporal structure of the features. We examine the kernel fusion method at both feature and score levels. The effectiveness of the proposed method is demonstrated through extensive experimentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
De Silva, L.C., Miyasato, T., Nakatsu, R.: ’Facial emotion recognition using multi-modal information’. In: Proceedings of IEEE International Conference on Information, Communications and Signal Processing, vol. 1, pp. 397–401 (1997)
Go, H., Kwak, K., Lee, D., Chun, M.: Emotion recognition from the facial image and speech signal. In: Proceedings of SICE Annual Conference, Japan, vol. 3, pp. 2890–2895 (2003)
Kanluan, I., Grimm, M., Kroschel, K.: Audio-visual emotion recognition using an emotion space concept. In: Proceedings of 16th European Signal Processing Conference, Lausanne, Switzerland (2008)
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: Proceedings of 10th IEEE International Symposium on Multimedia, pp. 250–257 (2008)
Han, M., Hus, J.H., Song, K.T.: A new information fusion method for bimodal robotic emotion recognition. Journal of Computers 3(7), 39–47 (2008)
Song, M., Chen, C., You, M.: Audio-visual based emotion recognition using tripled hidden Markov model. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, vol. 5, pp. 877–880 (2004)
Zeng, Z., Tu, J., Pianfetti, B., Huang, T.S.: Audio-visual Affective Expression Recognition through Multi-stream Fused HMM. IEEE Transactions on Multimedia 10(4), 570–577 (2008)
Wang, Y., Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Transactions on Multimedia 10(5), 936–946 (2008)
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 181–201 (2001)
Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Computing 12(10), 2385–2404 (2000)
Yang, J., Jin, Z., Yang, J.Y., Zhang, D., Frangi, A.F.: Essence of kernel Fisher discriminant: KPCA plus LDA. Pattern Recognition 37(10), 2097–2100 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Zhang, R., Guan, L., Venetsanopoulos, A.N. (2011). Kernel Fusion of Audio and Visual Information for Emotion Recognition. In: Kamel, M., Campilho, A. (eds) Image Analysis and Recognition. ICIAR 2011. Lecture Notes in Computer Science, vol 6754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21596-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-21596-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21595-7
Online ISBN: 978-3-642-21596-4
eBook Packages: Computer ScienceComputer Science (R0)