Abstract
Prosodic features modelling pitch, energy, and duration play a major role in speech emotion recognition. Our word level features, especially duration and pitch features, rely on correct word segmentation and F0 extraction. For the FAU Aibo Emotion Corpus, the automatic segmentation of a forced alignment of the spoken word sequence and the automatically extracted F0 values have been manually corrected. Frequencies of different types of segmentation and F0errors are given and their influence on emotion recognition using different groups of prosodic features is evaluated. The classification results show that the impact of these errors on emotion recognition is small.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
de Cheveigné, A., Kawahara, H.: Comparative Evaluation of F0 estimation algorithms. In: Proc. Eurospeech 2001, Aalborg, Denmark, pp. 2451–2454.
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The Impact of F0 Extraction Errors on the Classification of Prominence and Emotion. In: Proc. ICPhS 2007, Saarbrücken, Germany, pp. 2201–2204 (2007)
Stemmer, G.: Modeling Variability in Speech Recognition. Logos Verlag, Berlin (2005)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech coding and synthesis, pp. 495–518. Elsevier Science, Amsterdam (1995)
Batliner, A., Steidl, S., Nöth, E.: Laryngealizations and Emotions: How Many Babushkas? In: Proc. of International Workshop on Paralinguistic Speech – between Models and Data (ParaLing 2007), DFKI, Saarbrücken, Germany, pp. 17–22 (2007)
Batliner, A., Burger, S., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody, Lund University, Lund, Sweden, pp. 176–179 (1993)
Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)
Zell, A., Mache, N., Sommer, T., Korb, T.: The SNNS Neural Network Simulator. In: Radig, B. (ed.) Proc. of Mustererkennung 1991, 13. DAGM-Symposium, München, Germany, Informatik-Fachberichte, vol. 290, pp. 454–461. Springer, Heidelberg (1991)
Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence. Fundamental Frequency lends little. JASA 11, 1038–1054 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Steidl, S., Batliner, A., Nöth, E., Hornegger, J. (2008). Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-87391-4_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)