Emotion Recognition from Speech by Combining Databases and Fusion of Classifiers

Lefter, Iulia; Rothkrantz, Leon J. M.; Wiggers, Pascal; van Leeuwen, David A.

doi:10.1007/978-3-642-15760-8_45

Iulia Lefter^23,24,
Leon J. M. Rothkrantz^23,24,
Pascal Wiggers²³ &
…
David A. van Leeuwen²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1629 Accesses
18 Citations

Abstract

We explore possibilities for enhancing the generality, portability and robustness of emotion recognition systems by combining data-bases and by fusion of classifiers. In a first experiment, we investigate the performance of an emotion detection system tested on a certain database given that it is trained on speech from either the same database, a different database or a mix of both. We observe that generally there is a drop in performance when the test database does not match the training material, but there are a few exceptions. Furthermore, the performance drops when a mixed corpus of acted databases is used for training and testing is carried out on real-life recordings. In a second experiment we investigate the effect of training multiple emotion detectors, and fusing these into a single detection system. We observe a drop in the Equal Error Rate (eer) from 19.0 % on average for 4 individual detectors to 4.2 % when fused using FoCal [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

Article 05 September 2023

Multimodal Recognition of Emotions Using Physiological Signals with the Method of Decision-Level Fusion for Healthcare Applications

Speech emotion recognition using multimodal feature fusion with machine learning approach

Article 21 April 2023

References

Brümmer, N., Burget, L., Cernocky, J., Glembek, O., Grezl, F., Karafiat, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006. IEEE Transactions on Speech, Audio and Language Processing 15(7), 2072–2084 (2007)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-Sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE, 1370–1390 (2003)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Proceedings of Interspeech, pp. 312–315. ISCA (2009)
Google Scholar
Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech, 1st edn. Logos Verlag, Berlin (2009)
Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Wagner, J., Devillers, L., Vidrascu, L., Aharonson, V., Kessous, L., Amir, N.: Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech. Computer Speech and Language (2010)
Google Scholar
Vogt, T., Andre, E.: Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition. In: IEEE International Conference on Multimedia and Expo., pp. 474–477 (July 2005)
Google Scholar
Shami, M., Verhelst, W.: Automatic Classification of Expressiveness in Speech: A Multi-corpus Study. Speaker Classification II: Selected Projects, 43–56 (2007)
Google Scholar
Vidrascu, L., Devillers, L.: Anger Detection Performances Based on Prosodic and Acoustic Cues in Several Corpora. In: LREC 2008 (2008)
Google Scholar
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G.: Combining Frame and Turn-Level Information for Robust Recognition of Emotions within Speech. In: Proceedings of Interspeech (2007)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., Weiss, B.: A Database of German Emotional Speech. In: Proceedings of Interspeech, pp. 1517–1520 (2005)
Google Scholar
Engberg, I. S., Hansen, A. V.: Documentation of the Danish Emotional Speech Database (DES). Internal AAU report, Center for Person Kommunikation (1996)
Google Scholar
Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 Audio-Visual Emotion Database. In: 22nd International Conference on Data Engineering Workshops (2006)
Google Scholar
Lefter, I., Rothkrantz, L.J.M., Wiggers, P., van Leeuwen, D.A.: Automatic Stress Detection in Emergency (Telephone) Calls. Int. J. on Intelligent Defence Support Systems (2010) (submitted)
Google Scholar
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification Systems. Digital Signal Processing 10, 42–54 (2000)
Article Google Scholar
Juslin, P.N., Scherer, K.R.: Vocal Expression of Affect. In: Harrigan, J., Rosenthal, R., Scherer, K. (eds.) The New Handbook of Methods in Nonverbal Behavior Research, pp. 65–135. Oxford University Press, Oxford (2005)
Google Scholar
Truong, K.P., Raaijmakers, S.: Automatic Recognition of Spontaneous Emotions in Speech Using Acoustic and Lexical Features. In: Popescu-Belis, A., Stiefelhagen, R. (eds.) MLMI 2008. LNCS, vol. 5237, pp. 161–172. Springer, Heidelberg (2008)
Chapter Google Scholar
Boersma, P.: Praat, a System for Doing Phonetics by Computer. Glot International 5(9/10), 341–345 (2001)
Google Scholar
Chang, C. C., Lin, C. J.: LIBSVM: a Library for Support Vector Machines (2001)
Google Scholar
Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 121–124 (1992)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Campbell, W., Sturim, D., Reynolds, D.: Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)
Article Google Scholar
Brümmer, N.: Discriminative Acoustic Language Recognition via Channel-Compensated GMM Statistics. In: Proceedings of Interspeech. ISCA (2009)
Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The Det Curve In Assessment Of Detection Task Performance. In: Proceedings Eurospeech 1997, pp. 1895–1898 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Delft University of Technology, The Netherlands
Iulia Lefter, Leon J. M. Rothkrantz & Pascal Wiggers
The Netherlands Defense Academy,
Iulia Lefter & Leon J. M. Rothkrantz
TNO Human Factors, The Netherlands
David A. van Leeuwen

Authors

Iulia Lefter
View author publications
You can also search for this author in PubMed Google Scholar
Leon J. M. Rothkrantz
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Wiggers
View author publications
You can also search for this author in PubMed Google Scholar
David A. van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lefter, I., Rothkrantz, L.J.M., Wiggers, P., van Leeuwen, D.A. (2010). Emotion Recognition from Speech by Combining Databases and Fusion of Classifiers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-15760-8_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15759-2
Online ISBN: 978-3-642-15760-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics