Kernel Fusion of Audio and Visual Information for Emotion Recognition

Wang, Yongjin; Zhang, Rui; Guan, Ling; Venetsanopoulos, A. N.

doi:10.1007/978-3-642-21596-4_15

Yongjin Wang¹⁸,
Rui Zhang¹⁸,
Ling Guan¹⁸ &
…
A. N. Venetsanopoulos¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6754))

Included in the following conference series:

International Conference Image Analysis and Recognition

1157 Accesses
7 Citations

Abstract

Effective analysis and recognition of human emotional behavior are important for achieving efficient and intelligent human computer interaction. This paper presents an approach for audiovisual based multimodal emotion recognition. The proposed solution integrates the audio and visual information by fusing the kernel matrices of respective channels through algebraic operations, followed by dimensionality reduction techniques to map the original disparate features to a nonlinearly transformed joint subspace. A hidden Markov model is employed for characterizing the statistical dependence across successive frames, and identifying the inherent temporal structure of the features. We examine the kernel fusion method at both feature and score levels. The effectiveness of the proposed method is demonstrated through extensive experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

De Silva, L.C., Miyasato, T., Nakatsu, R.: ’Facial emotion recognition using multi-modal information’. In: Proceedings of IEEE International Conference on Information, Communications and Signal Processing, vol. 1, pp. 397–401 (1997)
Google Scholar
Go, H., Kwak, K., Lee, D., Chun, M.: Emotion recognition from the facial image and speech signal. In: Proceedings of SICE Annual Conference, Japan, vol. 3, pp. 2890–2895 (2003)
Google Scholar
Kanluan, I., Grimm, M., Kroschel, K.: Audio-visual emotion recognition using an emotion space concept. In: Proceedings of 16th European Signal Processing Conference, Lausanne, Switzerland (2008)
Google Scholar
Metallinou, A., Lee, S., Narayanan, S.: Audio-visual emotion recognition using Gaussian mixture models for face and voice. In: Proceedings of 10th IEEE International Symposium on Multimedia, pp. 250–257 (2008)
Google Scholar
Han, M., Hus, J.H., Song, K.T.: A new information fusion method for bimodal robotic emotion recognition. Journal of Computers 3(7), 39–47 (2008)
Article Google Scholar
Song, M., Chen, C., You, M.: Audio-visual based emotion recognition using tripled hidden Markov model. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, vol. 5, pp. 877–880 (2004)
Google Scholar
Zeng, Z., Tu, J., Pianfetti, B., Huang, T.S.: Audio-visual Affective Expression Recognition through Multi-stream Fused HMM. IEEE Transactions on Multimedia 10(4), 570–577 (2008)
Article Google Scholar
Wang, Y., Guan, L.: Recognizing human emotional state from audiovisual signals. IEEE Transactions on Multimedia 10(5), 936–946 (2008)
Article Google Scholar
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12, 181–201 (2001)
Article Google Scholar
Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Article Google Scholar
Baudat, G., Anouar, F.: Generalized discriminant analysis using a kernel approach. Neural Computing 12(10), 2385–2404 (2000)
Article Google Scholar
Yang, J., Jin, Z., Yang, J.Y., Zhang, D., Frangi, A.F.: Essence of kernel Fisher discriminant: KPCA plus LDA. Pattern Recognition 37(10), 2097–2100 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Ryerson University, Toronto, Ontario, Canada
Yongjin Wang, Rui Zhang, Ling Guan & A. N. Venetsanopoulos

Authors

Yongjin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Guan
View author publications
You can also search for this author in PubMed Google Scholar
A. N. Venetsanopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Waterloo, N2L 3G1, Waterloo, ON, Canada
Mohamed Kamel
Faculty of Engineering, Institute of Biomedical Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
Aurélio Campilho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Zhang, R., Guan, L., Venetsanopoulos, A.N. (2011). Kernel Fusion of Audio and Visual Information for Emotion Recognition. In: Kamel, M., Campilho, A. (eds) Image Analysis and Recognition. ICIAR 2011. Lecture Notes in Computer Science, vol 6754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21596-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-21596-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21595-7
Online ISBN: 978-3-642-21596-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics