Skip to main content

Hierarchical Clustering and Classification of Emotions in Human Speech Using Confusion Matrices

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Abstract

Although most of the natural emotions expressed in speech can be clearly identified by humans, automatic classification systems still display significant limitations on this task. Recently, hierarchical strategies have been proposed using different heuristics for choosing the appropriate levels in the hierarchy. In this paper, we propose a method for choosing these levels by hierarchically clustering a confusion matrix. To this end, a Mexican Spanish emotional speech database was created and employed to classify the ’big six’ emotions (anger, disgust, fear, joy, sadness, surprise) together with a neutral state. A set of 14 features was extracted from the speech signal of each utterance and a hierarchical classifier was defined from the dendrogram obtained by applying Wards clustering method to a certain confusion matrix. The classification rate of this hierarchical classifier showed a slight improvement compared to those of various classifiers trained directly with all 7 classes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benzeghiba, M., De Mori, R., Deroo, O., et al.: Automatic speech recognition and speech variability: A review. Speech Communication 49, 763–786 (2007)

    Article  Google Scholar 

  2. Mehrabian, A.: Communication without words. Psychology Today 2, 53–56 (1968)

    Google Scholar 

  3. Williams, C., Stevens, K.: Vocal correlates of emotional states. In: Speech Evaluation in Psychiatry. Grune and Stratton (1981)

    Google Scholar 

  4. Fernandez, R.: A computational model for the automatic recognition of affect in speech. Ph.D. Thesis, Massachussetts Institute of Technology (2004)

    Google Scholar 

  5. Cowie, R., Douglas, E., Tsapatsoulis, N., Kollias, S., Fellenz, W., Taylor, J.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18, 32–80 (2001)

    Article  Google Scholar 

  6. Ververdis, D., Kotropoulos, C.: A Review of Emotional Speech Databases. Department of Informatics, Aristotle University, Greece (2003)

    Google Scholar 

  7. Burkhardt, F., Paeschke, A., et al.: A database of German emotional speech. In: Proceedings of the Interspeech, Lisbon, pp. 1517–1520 (2005)

    Google Scholar 

  8. Barra-Chicote, R., Montero, J.M., Macias-Guarasa, J., Lufti, S., Lucas, J.M., Fernandez, F., D’haro, L.F., San-Segundo, R., Ferreiros, J., Cordoba, R., Pardo, J.M.: Spanish Expressive Voices: Corpus for Emotion Research in Spanish. In: Proc. of 6th International Conference on Language Resources and Evaluation (LREC 2008), Morocco (2008)

    Google Scholar 

  9. Ei Ayadi, M.: Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 572–587 (2011)

    Article  Google Scholar 

  10. Muñoz, A., Jiménez, F.: La expresion de la emocón a traés de la conducta vocal. Revista de Psicología General y Aplicada 43, 289–299 (1990)

    Google Scholar 

  11. Yang, B., Lugger, M.: Emotion recognition from speech signals using new harmony features. Signal Processing 90, 1415–1423 (2010)

    Article  MATH  Google Scholar 

  12. Schuller, B., Rigoll, G.: Timing levels in segment-based speech emotion recognition. In: Proceedings of Interspeech, Pittsburg, pp. 1818–1821 (2006)

    Google Scholar 

  13. Phlilippou-Hübner, D., Vlasenko, B., Böck, R., Wendemuth, A., von Guericke, O.: The performance of the speaking rate parameter in emotion recognition from speech. In: Proceedings of IEEE International Conference on Multimedia and Expo, Melbourne, pp. 248–253 (2012)

    Google Scholar 

  14. Sungrack, Y., Chang, Y.: Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech and Language Processing 20, 585–598 (2012)

    Article  Google Scholar 

  15. Batliner, A., Stedi, S., Schuller, B., et al.: Whodunnit - searching for the most important feature types signaling emotion-related user states on speech. Computer and Speech Language 25, 4–28 (2011)

    Article  Google Scholar 

  16. Gharavian, D., Scheikhan, M., Nazeriech, A., Garoucy, S.: Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computer & Applications 21, 2115–2126 (2012)

    Article  Google Scholar 

  17. Lee, C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using hierarchical decision tree approach. Speech Communication 53, 1162–1171 (2011)

    Article  Google Scholar 

  18. Giannoulis, P., Potamianos, G.: A hierarchical approach with feature selection for emotion recognition from speech. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, Istanbul, pp. 1203–1206 (2012)

    Google Scholar 

  19. Albornoz, E., Milone, D., Rufiner, H.: Spoken emotions using hierarchical classifiers. Computer Speech and Language 25, 556–570 (2011)

    Article  Google Scholar 

  20. Chung-Hsien, W., Chi-Chun, H., Chung-Han, L., Mai-Chun, L.: Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis. IEEE Transactions on Audio, Speech and Language Processing 18, 1394–1405 (2010)

    Article  Google Scholar 

  21. Vaughan, B., Cullen, C.: Emotional speech corpus creation, structure, distribution and re-use. In: Young Researchers Workshop in Speech Technology (YRWST 2009), Dublin (2009)

    Google Scholar 

  22. Van Eyne, F., Gibbon, D. (eds.): Lexicon Development for Speech and Language Processing. Springer (2000)

    Google Scholar 

  23. Swadesh lists for Spanish, http://en.wiktionary.org/wiki/Appendix:Spanish_Swadesh_list

  24. Speech Filing System, University College London, http://www.phon.ucl.ac.uk/resource/sfs/

  25. Godbole, S.: Exploiting Confusion Matrices for Automatic Generation of Topic Hierarchies and Scaling Up Multi-Way Classifiers. Technical report, IIT Bombay (2002)

    Google Scholar 

  26. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. John Wiley & Sons Inc. (2011)

    Google Scholar 

  27. Gan, G., Ma, C., Wu, J.: Data Clustering Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)

    Google Scholar 

  28. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  29. van der Maaten, L.J.P., Hinton, G.E.: Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Reyes-Vargas, M. et al. (2013). Hierarchical Clustering and Classification of Emotions in Human Speech Using Confusion Matrices. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01931-4_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01930-7

  • Online ISBN: 978-3-319-01931-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics