Skip to main content

Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4441))

Abstract

We present a study on the automatic classification of expressiveness in speech using four databases that belong to two distinct groups: the first group of two databases contains adult speech directed to infants, while the second group contains adult speech directed to adults. We performed experiments with two approaches for feature extraction, the approach developed for Sony’s robotic dog AIBO (AIBO) and a segment based approach (SBA), and three machine learning algorithms for training the classifiers. In mono corpus experiments, the classifiers were trained and tested on each database individually. The results show that AIBO and SBA are competitive on the four databases considered, although the AIBO approach works better with long utterances whereas the SBA seems to be better suited for classification of short utterances. When training was performed on one database and testing on another database of the same group, little generalization across the databases happened because emotions with the same label occupy different regions of the feature space for the different databases. Fortunately, when the databases are merged, classification results are comparable to within-database experiments, indicating that the existing approaches for the classification of emotions in speech are efficient enough to handle larger amounts of training data without any reduction in classification accuracy, which should lead to classifiers that are more robust to varying styles of expressiveness in speech.

This paper summarizes research that was reported in the manuscript “An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech”, which was accepted for publication in Speech Communication.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Picard, R.: 1997. Affective Computing. The MIT Press. Wells, J.C.: Accents of English, Cambridge University Press, Cambridge (1982)

    Google Scholar 

  2. Nwe, T., Foo, S., De Silva, L.: Speech Emotion Recognition Using Hidden Markov Models. Speech Communication 41-4, 603–623 (2003)

    Article  Google Scholar 

  3. Fernandez, R., Picard, R.W.: Classical and Novel Discriminant Features for Affect Recognition from Speech. In: Interspeech 2005, Lisbon, Portugal, pp. 473–476 (2005)

    Google Scholar 

  4. Cichosz, J., Slot, K.: Low-dimensional feature space derivation for emotion recognition. In: Interspeech 2005, Lisbon, Portugal, pp. 477–480 (2005)

    Google Scholar 

  5. Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59, 157–183 (2003)

    Article  Google Scholar 

  6. Breazeal, C., Aryananda, L.: Recognition of Affective Communicative Intent in Robot-Directed Speech. Autonomous Robots 12, 83–104 (2002)

    Article  MATH  Google Scholar 

  7. Katz, G., Cohn, J., Moore, C.: A combination of vocal F0 dynamic and summary features discriminates between pragmatic categories of infant-directed speech. Child Development 67, 205–217 (1996)

    Google Scholar 

  8. Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)

    Article  MATH  Google Scholar 

  9. Batliner, A., Steidl, S., Hacker, C., Nöth, E., Niemann, H.: Tales of tuning - prototyping for automatic classification of emotional user states. In: Interspeech 2005, pp. 489–492 (2005)

    Google Scholar 

  10. Slaney, M., McRoberts, G.: A Recognition System for Affective Vocalization. Speech Communication 39, 367–384 (2003)

    Article  MATH  Google Scholar 

  11. Shami, M., Kamel, M.: Segment Based Approach to the Recognition of Affective Intents in Speech. Manuscript submitted for publication (2005)

    Google Scholar 

  12. Paeschke, A., Sendlmeier, W.: Prosodic characteristics of emotional speech: measurements of fundamental frequency movements. In: Proc. of the ISCA ITRW on Speech and Emotion, Belfast, pp. 75–80 (2000)

    Google Scholar 

  13. Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Internal AAU report, Center for Person Kommunikation, Denmark (1996)

    Google Scholar 

  14. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  15. Frank, E., Xu, X.: Applying Propositional Learning Algorithms to Multi-instance data. Working Paper, Department of Computer Science, University of Waikato (2003), www.cs.waukato.nz/ml/milk

  16. Boersma, P., Weenink, D.: PRAAT: a system for doing phonetics by computer. Report of the Institute for Phonetic Sciences of the University of Amsterdam 132 (1996), http://www.praat.org

  17. Boersma, P.: 1993. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam, vol. 17, pp. 97–110 (1993)

    Google Scholar 

  18. Shami, M., Kamel, M.: Segment-based Approach to the Recognition of Emotions in Speech. In: IEEE Conference on Multimedia and Expo (ICME 2005), Amsterdam, The Netherlands (2005)

    Google Scholar 

  19. Ververidis, D., Kotropolos, C.: Automatic speech classification to five emotional states based on gender information. In: Proc. Eusipco 2004, Vienna, Austria, pp. 341–344 (2004)

    Google Scholar 

  20. Hammal, Z., Bozkurt, B., Couvreur, L., Unay, D., Caplier, A., Dutoit, T.: Passive versus active: vocal classification system. In: Proc. of Eusipco 2005, Antalya, Turkey (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Shami, M., Verhelst, W. (2007). Automatic Classification of Expressiveness in Speech: A Multi-corpus Study. In: Müller, C. (eds) Speaker Classification II. Lecture Notes in Computer Science(), vol 4441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74122-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74122-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74121-3

  • Online ISBN: 978-3-540-74122-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics