Skip to main content

Utilizing Psychoacoustic Modeling to Improve Speech-Based Emotion Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

Abstract

Usually, compression methods are avoided for emotion recognition problems, as it is feared that compression degrades the acoustic characteristics needed for an accurate recognition. By contrast, we assume that the psychoacoustic modeling used for transparent music compression could actually improve speech-based emotion recognition, as it removes certain parts of the acoustic signal that are considered “unnecessary”, while still containing the full emotional information.

To test this assumption, we conducted several recognition experiments employing different datasets to verify the generalizability of this assumption. Depending on the dataset, we achieved performance gains between 0.94% and 4.86% absolute. Furthermore, we identified the features that are modified by the psychoacoustic modeling and confirmed by additional recognition experiments that the modification of these features is responsible for the observed performance increase. Although the feature influence is dataset specific, a small group of four low-level feature descriptors is shared amongst all three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Albahri, A., Lech, M., Cheng, E.: Effect of speech compression on the automatic recognition of emotions. Int. J. Signal Process. Syst. 4(1), 55–61 (2016)

    Google Scholar 

  2. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the INTERSPEECH 2005, Lisbon, Portugal, pp. 1517–1520 (2005)

    Google Scholar 

  3. Byrne, C., Foulkes, P.: The ‘mobile phone effect’ on vowel formants. Int. J. Speech Lang. Law 11(1), 83–102 (2004)

    Article  Google Scholar 

  4. Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 189–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_15

    Chapter  Google Scholar 

  5. Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM 2010. p. s.p., Firenze, Italy (2010)

    Google Scholar 

  6. Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68888-4

    Book  Google Scholar 

  7. García, N., Vásquez-Correa, J.C., Arias-Londoño, J.D., Várgas-Bonilla, J.F., Orozco-Arroyave, J.R.: Automatic emotion recognition in compressed speech using acoustic and non-linear features. In: 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA), Bogota, Colombia, pp. 1–7 (2015)

    Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  9. Hansen, J., Bou-Ghazale, S.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, vol. 4, pp. 1743–1746 (1997)

    Google Scholar 

  10. Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013). https://tools.ietf.org/html/draft-ietf-codec-results-03

  11. Lefter, I., Nefs, H.T., Jonker, C.M., Rothkrantz, L.: Cross-corpus analysis for acoustic recognition of negative interactions. In: Proceedings of the 6th ACII, Xian, China, pp. 132–138 (2015)

    Google Scholar 

  12. Lotz, A.F., Siegert, I., Maruschke, M., Wendemuth, A.: Audio compression and its impact on emotion recognition in affective computing. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz, vol. 86, pp. 1–8. TUDpress, Saarbrücken (2017)

    Google Scholar 

  13. Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_69

    Chapter  Google Scholar 

  14. Pan, D.: A tutorial on mpeg/audio compression. IEEE MultiMed. 2(2), 60–74 (1995)

    Article  Google Scholar 

  15. Pfister, T., Robinson, P.: Speech emotion classification and public speaking skill assessment. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 151–162. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14715-9_15

    Chapter  Google Scholar 

  16. Schuller, B., Müller, R., Hörnler, B., Höthker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the 9th ACM ICMI, pp. 30–37 (2007)

    Google Scholar 

  17. Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE ASRU 2009, Merano, Italy, pp. 552–557 (2009)

    Google Scholar 

  18. Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010)

    Article  Google Scholar 

  19. Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M.: Acoustic cues for the perceptual assessment of surround sound. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 65–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_6

    Chapter  Google Scholar 

  20. Siegert, I., Lotz, A.F., Duong, L.L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Tagungsband der 27. Konferenz, vol. 81, pp. 229–236. TUDpress, Leipzig (2016)

    Google Scholar 

  21. Siegert, I., Lotz, A.F., Egorow, O., Wendemuth, A.: Improving speech-based emotion recognition by using psychoacoustic modeling and analysis-by-synthesis. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 445–455. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_44

    Chapter  Google Scholar 

  22. Siegert, I., Lotz, A.F., Maruschke, M., Jokisch, O., Wendemuth, A.: Emotion intelligibility within codec-compressed and reduced bandwith speech. In: ITG-Fb. 267: Speech Communication: 12. ITG-Fachtagung Sprachkommunikation, pp. 215–219. VDE Verlag, Paderborn, October 2016

    Google Scholar 

  23. Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. EEE/ACM Trans. Audio Speech Lang. Process. 24(1), 16–28 (2016)

    Article  Google Scholar 

  24. Tahon, M., Devillers, L.: Acoustic measures characterizing anger across corpora collected in artificial or natural context. In: International Conference on Speech Prosody (SP 2010), Chicago, USA, May 2010

    Google Scholar 

  25. Tickle, A., Raghu, S., Elshaw, M.: Emotional recognition from the speech signal for a virtual education agent. J. Phys. Conf. Ser. 450, 012053 (2013)

    Article  Google Scholar 

  26. Valin, J.M., Terriberry, T.B., Montgomery, C., Maxwell, G.: A high-quality speech and audio codec with less than 10-ms delay. Trans. Audio Speech Lang. Process. 18(1), 58–67 (2010)

    Article  Google Scholar 

  27. Valin, J.M., Vos, K., Terriberry, T.B.: Definition of the Opus Audio Codec. RFC 6716, RFC Editor, September 2012. https://tools.ietf.org/html/rfc6716

  28. Xu, X., et al.: Survey on discriminative feature selection for speech emotion recognition. In: 9th ISCSLP, pp. 345–349 (2014)

    Google Scholar 

  29. Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings of the IEEE ASRU 2011, Waikoloa, USA, pp. 523–528 (2011)

    Google Scholar 

Download references

Acknowledgements

This work has further been sponsored by the German Federal Ministry of Education and Research in the program Zwanzig20 – Partnership for Innovation as part of the research alliance 3Dsensation. One of us (A.F. Lotz) wishes to acknowledge funding from the European Union’s Horizon 2020 research and innovation programme in the project “ADAS&Me” under grant agreement No. 68890.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ingo Siegert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Siegert, I., Lotz, A.F., Egorow, O., Wolff, S. (2018). Utilizing Psychoacoustic Modeling to Improve Speech-Based Emotion Recognition. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics