Combining Audio and Video for Detection of Spontaneous Emotions

Gajšek, Rok; Štruc, Vitomir; Dobrišek, Simon; Žibert, Janez; Mihelič, France; Pavešić, Nikola

doi:10.1007/978-3-642-04391-8_15

Rok Gajšek²⁰,
Vitomir Štruc²⁰,
Simon Dobrišek²⁰,
Janez Žibert²⁰,
France Mihelič²⁰ &
…
Nikola Pavešić²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5707))

Included in the following conference series:

European Workshop on Biometrics and Identity Management

1067 Accesses

Abstract

The paper presents our initial attempts in building an audio video emotion recognition system. Both, audio and video sub-systems are discussed, and description of the database of spontaneous emotions is given. The task of labelling the recordings from the database according to different emotions is discussed and the measured agreement between multiple annotators is presented. Instead of focusing on the prosody in audio emotion recognition, we evaluate the possibility of using linear transformations (CMLLR) as features. The classification results from audio and video sub-systems are combined using sum rule fusion and the increase in recognition results, when using both modalities, is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Song, M., Chen, C., You, M.: Audio-visual based emotion recognition using tripled hidden Markov model. In: Proceedings of Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 5, pp. 877–880 (2004)
Google Scholar
Gajšek, R., et al.: Multi-Modal Emotional Database: AvID. Informatica 33, 101–106 (2009)
Google Scholar
Busso, C., et al.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ICMI 2004: Proceedings of the 6th international conference on Multimodal interfaces, pp. 205–211. ACM, New York (2004)
Google Scholar
Eckman, P.: Strong Evidence for universals in facial expressions. Psychol. Bull. 115(2), 268–287 (1994)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE TPAMI 22(12), 1424–1445 (2000)
Article Google Scholar
Viola, P., Jones, M.: Robust real-time object detection. In: Proc. of the Second Intenrnational Workshop on Statistical and Computational Theories of Vision - Modeling, Learning, Computing and Sampling, Vancouver, Canada (2001)
Google Scholar
Ang, J., et al.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: Proc. ICSLP 2002, vol. 3, pp. 2037–2040 (2002)
Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12(2), 75–98 (1998)
Article Google Scholar
Mihelič, F., et al.: Spoken language resources at LUKS of the University of Ljubljana. Int. J. of Speech Technology 6(3), 221–232 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000, Ljubljana, Slovenia
Rok Gajšek, Vitomir Štruc, Simon Dobrišek, Janez Žibert, France Mihelič & Nikola Pavešić

Authors

Rok Gajšek
View author publications
You can also search for this author in PubMed Google Scholar
Vitomir Štruc
View author publications
You can also search for this author in PubMed Google Scholar
Simon Dobrišek
View author publications
You can also search for this author in PubMed Google Scholar
Janez Žibert
View author publications
You can also search for this author in PubMed Google Scholar
France Mihelič
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Pavešić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco Tomas y Valiente 11, 28049, Madrid, Spain
Julian Fierrez & Javier Ortega-Garcia &
Second University of Naples, and IIASS, Via Vivaldi 43, 81100, Caserta, Italy
Anna Esposito
EPFL, Speech Processing and Biometrics Group, EPFL-STI-IEL-LIDIAP, ELE 233, Station 11, 1015, Lausanne, Switzerland
Andrzej Drygajlo
Escola Universitària Politècnica de Mataró, Avda. Puig i Cadafalch 101-111, 08303, Mataro (Barcelona), Spain
Marcos Faundez-Zanuy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gajšek, R., Štruc, V., Dobrišek, S., Žibert, J., Mihelič, F., Pavešić, N. (2009). Combining Audio and Video for Detection of Spontaneous Emotions. In: Fierrez, J., Ortega-Garcia, J., Esposito, A., Drygajlo, A., Faundez-Zanuy, M. (eds) Biometric ID Management and Multimodal Communication. BioID 2009. Lecture Notes in Computer Science, vol 5707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04391-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-04391-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04390-1
Online ISBN: 978-3-642-04391-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics