Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

Steidl, Stefan; Batliner, Anton; Nöth, Elmar; Hornegger, Joachim

doi:10.1007/978-3-540-87391-4_67

Stefan Steidl¹,
Anton Batliner¹,
Elmar Nöth¹ &
…
Joachim Hornegger¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

996 Accesses
7 Citations

Abstract

Prosodic features modelling pitch, energy, and duration play a major role in speech emotion recognition. Our word level features, especially duration and pitch features, rely on correct word segmentation and F0 extraction. For the FAU Aibo Emotion Corpus, the automatic segmentation of a forced alignment of the spoken word sequence and the automatically extracted F0 values have been manually corrected. Frequencies of different types of segmentation and F0errors are given and their influence on emotion recognition using different groups of prosodic features is evaluated. The classification results show that the impact of these errors on emotion recognition is small.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

Article 28 July 2023

Speech Prosody Extraction for Ibibio Emotions Analysis and Classification

Linguistic analysis for emotion recognition: a case of Chinese speakers

Article 18 March 2023

References

de Cheveigné, A., Kawahara, H.: Comparative Evaluation of F0 estimation algorithms. In: Proc. Eurospeech 2001, Aalborg, Denmark, pp. 2451–2454.
Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The Impact of F0 Extraction Errors on the Classification of Prominence and Emotion. In: Proc. ICPhS 2007, Saarbrücken, Germany, pp. 2201–2204 (2007)
Google Scholar
Stemmer, G.: Modeling Variability in Speech Recognition. Logos Verlag, Berlin (2005)
Google Scholar
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech coding and synthesis, pp. 495–518. Elsevier Science, Amsterdam (1995)
Google Scholar
Batliner, A., Steidl, S., Nöth, E.: Laryngealizations and Emotions: How Many Babushkas? In: Proc. of International Workshop on Paralinguistic Speech – between Models and Data (ParaLing 2007), DFKI, Saarbrücken, Germany, pp. 17–22 (2007)
Google Scholar
Batliner, A., Burger, S., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody, Lund University, Lund, Sweden, pp. 176–179 (1993)
Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)
Article Google Scholar
Zell, A., Mache, N., Sommer, T., Korb, T.: The SNNS Neural Network Simulator. In: Radig, B. (ed.) Proc. of Mustererkennung 1991, 13. DAGM-Symposium, München, Germany, Informatik-Fachberichte, vol. 290, pp. 454–461. Springer, Heidelberg (1991)
Google Scholar
Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence. Fundamental Frequency lends little. JASA 11, 1038–1054 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstraße 3, D-91058, Erlangen, Germany
Stefan Steidl, Anton Batliner, Elmar Nöth & Joachim Hornegger

Authors

Stefan Steidl
View author publications
You can also search for this author in PubMed Google Scholar
Anton Batliner
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Hornegger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Steidl, S., Batliner, A., Nöth, E., Hornegger, J. (2008). Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_67

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

Speech Prosody Extraction for Ibibio Emotions Analysis and Classification

Linguistic analysis for emotion recognition: a case of Chinese speakers

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Hierarchical emotion recognition from speech using source, power spectral and prosodic features

Speech Prosody Extraction for Ibibio Emotions Analysis and Classification

Linguistic analysis for emotion recognition: a case of Chinese speakers

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation