Utilizing Psychoacoustic Modeling to Improve Speech-Based Emotion Recognition

Siegert, Ingo; Lotz, Alicia Flores; Egorow, Olga; Wolff, Susann

doi:10.1007/978-3-319-99579-3_64

Ingo Siegert¹⁶,
Alicia Flores Lotz¹⁶,
Olga Egorow¹⁶ &
…
Susann Wolff¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1419 Accesses
1 Citations

Abstract

Usually, compression methods are avoided for emotion recognition problems, as it is feared that compression degrades the acoustic characteristics needed for an accurate recognition. By contrast, we assume that the psychoacoustic modeling used for transparent music compression could actually improve speech-based emotion recognition, as it removes certain parts of the acoustic signal that are considered “unnecessary”, while still containing the full emotional information.

To test this assumption, we conducted several recognition experiments employing different datasets to verify the generalizability of this assumption. Depending on the dataset, we achieved performance gains between 0.94% and 4.86% absolute. Furthermore, we identified the features that are modified by the psychoacoustic modeling and confirmed by additional recognition experiments that the modification of these features is responsible for the observed performance increase. Although the feature influence is dataset specific, a small group of four low-level feature descriptors is shared amongst all three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albahri, A., Lech, M., Cheng, E.: Effect of speech compression on the automatic recognition of emotions. Int. J. Signal Process. Syst. 4(1), 55–61 (2016)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the INTERSPEECH 2005, Lisbon, Portugal, pp. 1517–1520 (2005)
Google Scholar
Byrne, C., Foulkes, P.: The ‘mobile phone effect’ on vowel formants. Int. J. Speech Lang. Law 11(1), 83–102 (2004)
Article Google Scholar
Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 189–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_15
Chapter Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM 2010. p. s.p., Firenze, Italy (2010)
Google Scholar
Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68888-4
Book Google Scholar
García, N., Vásquez-Correa, J.C., Arias-Londoño, J.D., Várgas-Bonilla, J.F., Orozco-Arroyave, J.R.: Automatic emotion recognition in compressed speech using acoustic and non-linear features. In: 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA), Bogota, Colombia, pp. 1–7 (2015)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Hansen, J., Bou-Ghazale, S.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, vol. 4, pp. 1743–1746 (1997)
Google Scholar
Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013). https://tools.ietf.org/html/draft-ietf-codec-results-03
Lefter, I., Nefs, H.T., Jonker, C.M., Rothkrantz, L.: Cross-corpus analysis for acoustic recognition of negative interactions. In: Proceedings of the 6th ACII, Xian, China, pp. 132–138 (2015)
Google Scholar
Lotz, A.F., Siegert, I., Maruschke, M., Wendemuth, A.: Audio compression and its impact on emotion recognition in affective computing. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz, vol. 86, pp. 1–8. TUDpress, Saarbrücken (2017)
Google Scholar
Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_69
Chapter Google Scholar
Pan, D.: A tutorial on mpeg/audio compression. IEEE MultiMed. 2(2), 60–74 (1995)
Article Google Scholar
Pfister, T., Robinson, P.: Speech emotion classification and public speaking skill assessment. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 151–162. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14715-9_15
Chapter Google Scholar
Schuller, B., Müller, R., Hörnler, B., Höthker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the 9th ACM ICMI, pp. 30–37 (2007)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE ASRU 2009, Merano, Italy, pp. 552–557 (2009)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010)
Article Google Scholar
Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M.: Acoustic cues for the perceptual assessment of surround sound. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 65–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_6
Chapter Google Scholar
Siegert, I., Lotz, A.F., Duong, L.L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Tagungsband der 27. Konferenz, vol. 81, pp. 229–236. TUDpress, Leipzig (2016)
Google Scholar
Siegert, I., Lotz, A.F., Egorow, O., Wendemuth, A.: Improving speech-based emotion recognition by using psychoacoustic modeling and analysis-by-synthesis. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 445–455. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_44
Chapter Google Scholar
Siegert, I., Lotz, A.F., Maruschke, M., Jokisch, O., Wendemuth, A.: Emotion intelligibility within codec-compressed and reduced bandwith speech. In: ITG-Fb. 267: Speech Communication: 12. ITG-Fachtagung Sprachkommunikation, pp. 215–219. VDE Verlag, Paderborn, October 2016
Google Scholar
Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. EEE/ACM Trans. Audio Speech Lang. Process. 24(1), 16–28 (2016)
Article Google Scholar
Tahon, M., Devillers, L.: Acoustic measures characterizing anger across corpora collected in artificial or natural context. In: International Conference on Speech Prosody (SP 2010), Chicago, USA, May 2010
Google Scholar
Tickle, A., Raghu, S., Elshaw, M.: Emotional recognition from the speech signal for a virtual education agent. J. Phys. Conf. Ser. 450, 012053 (2013)
Article Google Scholar
Valin, J.M., Terriberry, T.B., Montgomery, C., Maxwell, G.: A high-quality speech and audio codec with less than 10-ms delay. Trans. Audio Speech Lang. Process. 18(1), 58–67 (2010)
Article Google Scholar
Valin, J.M., Vos, K., Terriberry, T.B.: Definition of the Opus Audio Codec. RFC 6716, RFC Editor, September 2012. https://tools.ietf.org/html/rfc6716
Xu, X., et al.: Survey on discriminative feature selection for speech emotion recognition. In: 9th ISCSLP, pp. 345–349 (2014)
Google Scholar
Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings of the IEEE ASRU 2011, Waikoloa, USA, pp. 523–528 (2011)
Google Scholar

Download references

Acknowledgements

This work has further been sponsored by the German Federal Ministry of Education and Research in the program Zwanzig20 – Partnership for Innovation as part of the research alliance 3Dsensation. One of us (A.F. Lotz) wishes to acknowledge funding from the European Union’s Horizon 2020 research and innovation programme in the project “ADAS&Me” under grant agreement No. 68890.

Author information

Authors and Affiliations

Cognitive Systems Group, Otto von Guericke University, 39106, Magdeburg, Germany
Ingo Siegert, Alicia Flores Lotz & Olga Egorow
Special Lab Non-Invasive Brain Imaging, Leibniz Institute for Neurobiology, 39118, Magdeburg, Germany
Susann Wolff

Authors

Ingo Siegert
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Flores Lotz
View author publications
You can also search for this author in PubMed Google Scholar
Olga Egorow
View author publications
You can also search for this author in PubMed Google Scholar
Susann Wolff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ingo Siegert .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siegert, I., Lotz, A.F., Egorow, O., Wolff, S. (2018). Utilizing Psychoacoustic Modeling to Improve Speech-Based Emotion Recognition. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_64
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics