Skip to main content

Machine Learning Approaches for Speech Emotion Recognition: Classic and Novel Advances

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10762))

Abstract

Speech is the most natural form of communication for human beings, and among others, it provides information about the speaker’s emotional state. The current study focuses on automatic speech emotion recognition based on classic and innovated machine learning approaches using simulated emotional speech data. Specifically, individual Gaussian mixture models (GMM) trained for each emotion, a universal background GMM model (UBM-GMM) adapted to each emotion using maximum posteriori (MAP) adaptation, and an approach based on i-vector paradigm, widely used in speaker recognition and language identification, and adapted to emotion recognition are used. When using individual GMMs, a novel technique based on multiple classifiers and late fusion is also applied. In this case, a 90.9% recognition rate is been obtained. When the state-of-the-art, i-vector paradigm based method, along with probabilistic linear discriminant analysis (PLDA) model is used, a 91.4% average rate for speaker-independent Japanese speech emotion recognition is achieved, which is a very promising result and superior to similar studies. In addition to the Japanese emotion recognition, pair-wise recognition for seven emotions in German language has also been conducted. The recognition rates obtained using the German database show the same tendency as in Japanese. In this experiment, an 89.2% average rate has been achieved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social Emotions in Nature and Artifact: Emotions in Human and Human-Computer Interaction, pp. 110–127. Oxford University Press, New York (2013)

    Chapter  Google Scholar 

  2. Tang, H., Chu, S., Johnson, M.H.: Emotion recognition from speech via boosted Gaussian mixture models. In Proceedings of ICME, pp. 294–297 (2009)

    Google Scholar 

  3. Xu, S., Liu, Y., Liu, X.: Speaker recognition and speech emotion recognition based on GMM. In: 3rd International Conference on Electric and Electronics (EEIC 2013), pp. 434–436 (2013)

    Google Scholar 

  4. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: Proceedings of IEEE ICASSP, vol. I, pp. 401–404 (2003)

    Google Scholar 

  5. Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)

    Google Scholar 

  6. Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. NCA 9(4), 290296 (2000)

    Article  Google Scholar 

  7. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 223–227 (2014)

    Google Scholar 

  8. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  9. Fosler-Lussier, E., Amdal, I., Kuo, H.: A framework for predicting speech recognition errors. Speech Commun. 46, 153–170 (2005)

    Article  Google Scholar 

  10. Silva, J., Narayanan, S.: Average divergence distance as a statistical discrimination measure for hidden Markov models. IEEE Trans. Speech Audio Process. 14, 890–906 (2006)

    Article  Google Scholar 

  11. Yamamoto, K., Nakagawa, S.: Differences of speech rate, interphoneme distance and likelihood caused by speaking style, their relationship and recognition performance. Syst. Comput. Jpn 33(7), 50–60 (2002)

    Article  Google Scholar 

  12. Sahidullah, M., Saha, G.: Design analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543565 (2012)

    Article  Google Scholar 

  13. O’Shaughnessy, D.: Linear predictive coding. IEEE Potentials 7(1), 29–32 (1988)

    Article  Google Scholar 

  14. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. AcousL Soc. Am. 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  15. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In Proceedings of Interspeech, pp. 1517–1520 (2005)

    Google Scholar 

  16. Juang, B.H., Rabiner, L.: A probabilistic distance measure for hidden Markov models. AT&T Tech. J. 391–408 (1985)

    Google Scholar 

  17. Metallinou, A., Lee, S., Narayanan, S.: Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: Proceedings of ICASSP, pp. 2462–2465 (2019)

    Google Scholar 

  18. Prince, S., Elder, J.: Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  19. Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., Narayanan, S.: Emotion recognition based on phoneme classes. In: Proceedings of ICSLP, pp. 889–892 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panikos Heracleous .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Heracleous, P., Ishikawa, A., Yasuda, K., Kawashima, H., Sugaya, F., Hashimoto, M. (2018). Machine Learning Approaches for Speech Emotion Recognition: Classic and Novel Advances. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10762. Springer, Cham. https://doi.org/10.1007/978-3-319-77116-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77116-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77115-1

  • Online ISBN: 978-3-319-77116-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics