Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

Shami, Mohammad; Verhelst, Werner

doi:10.1007/978-3-540-74122-0_5

Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

Mohammad Shami¹ &
Werner Verhelst¹

Chapter

1295 Accesses
30 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

We present a study on the automatic classification of expressiveness in speech using four databases that belong to two distinct groups: the first group of two databases contains adult speech directed to infants, while the second group contains adult speech directed to adults. We performed experiments with two approaches for feature extraction, the approach developed for Sony’s robotic dog AIBO (AIBO) and a segment based approach (SBA), and three machine learning algorithms for training the classifiers. In mono corpus experiments, the classifiers were trained and tested on each database individually. The results show that AIBO and SBA are competitive on the four databases considered, although the AIBO approach works better with long utterances whereas the SBA seems to be better suited for classification of short utterances. When training was performed on one database and testing on another database of the same group, little generalization across the databases happened because emotions with the same label occupy different regions of the feature space for the different databases. Fortunately, when the databases are merged, classification results are comparable to within-database experiments, indicating that the existing approaches for the classification of emotions in speech are efficient enough to handle larger amounts of training data without any reduction in classification accuracy, which should lead to classifiers that are more robust to varying styles of expressiveness in speech.

This paper summarizes research that was reported in the manuscript “An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech”, which was accepted for publication in Speech Communication.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Picard, R.: 1997. Affective Computing. The MIT Press. Wells, J.C.: Accents of English, Cambridge University Press, Cambridge (1982)
Google Scholar
Nwe, T., Foo, S., De Silva, L.: Speech Emotion Recognition Using Hidden Markov Models. Speech Communication 41-4, 603–623 (2003)
Article Google Scholar
Fernandez, R., Picard, R.W.: Classical and Novel Discriminant Features for Affect Recognition from Speech. In: Interspeech 2005, Lisbon, Portugal, pp. 473–476 (2005)
Google Scholar
Cichosz, J., Slot, K.: Low-dimensional feature space derivation for emotion recognition. In: Interspeech 2005, Lisbon, Portugal, pp. 477–480 (2005)
Google Scholar
Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59, 157–183 (2003)
Article Google Scholar
Breazeal, C., Aryananda, L.: Recognition of Affective Communicative Intent in Robot-Directed Speech. Autonomous Robots 12, 83–104 (2002)
Article MATH Google Scholar
Katz, G., Cohn, J., Moore, C.: A combination of vocal F0 dynamic and summary features discriminates between pragmatic categories of infant-directed speech. Child Development 67, 205–217 (1996)
Google Scholar
Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)
Article MATH Google Scholar
Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of tuning - prototyping for automatic classification of emotional user states. In: Interspeech 2005, pp. 489–492 (2005)
Google Scholar
Slaney, M., McRoberts, G.: A Recognition System for Affective Vocalization. Speech Communication 39, 367–384 (2003)
Article MATH Google Scholar
Shami, M., Kamel, M.: Segment Based Approach to the Recognition of Affective Intents in Speech. Manuscript submitted for publication (2005)
Google Scholar
Paeschke, A., Sendlmeier, W.: Prosodic characteristics of emotional speech: measurements of fundamental frequency movements. In: Proc. of the ISCA ITRW on Speech and Emotion, Belfast, pp. 75–80 (2000)
Google Scholar
Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Internal AAU report, Center for Person Kommunikation, Denmark (1996)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Frank, E., Xu, X.: Applying Propositional Learning Algorithms to Multi-instance data. Working Paper, Department of Computer Science, University of Waikato (2003), www.cs.waukato.nz/ml/milk
Boersma, P., Weenink, D.: PRAAT: a system for doing phonetics by computer. Report of the Institute for Phonetic Sciences of the University of Amsterdam 132 (1996), http://www.praat.org
Boersma, P.: 1993. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, vol. 17, pp. 97–110 (1993)
Google Scholar
Shami, M., Kamel, M.: Segment-based Approach to the Recognition of Emotions in Speech. In: IEEE Conference on Multimedia and Expo (ICME 2005), Amsterdam, The Netherlands (2005)
Google Scholar
Ververidis, D., Kotropolos, C.: Automatic speech classification to five emotional states based on gender information. In: Proc. Eusipco 2004, Vienna, Austria, pp. 341–344 (2004)
Google Scholar
Hammal, Z., Bozkurt, B., Couvreur, L., Unay, D., Caplier, A., Dutoit, T.: Passive versus active: vocal classification system. In: Proc. of Eusipco 2005, Antalya, Turkey (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit Brussel, Interdisciplinary Institute for Broadband Technology - IBBT, department ETRO-DSSP, Pleinlaan 2, B-1050 Brussels, Belgium.
Mohammad Shami & Werner Verhelst

Authors

Mohammad Shami
View author publications
You can also search for this author in PubMed Google Scholar
Werner Verhelst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shami, M., Verhelst, W. (2007). Automatic Classification of Expressiveness in Speech: A Multi-corpus Study. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-74122-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74121-3
Online ISBN: 978-3-540-74122-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics