Abstract
Usually, compression methods are avoided for emotion recognition problems, as it is feared that compression degrades the acoustic characteristics needed for an accurate recognition. By contrast, we assume that the psychoacoustic modeling used for transparent music compression could actually improve speech-based emotion recognition, as it removes certain parts of the acoustic signal that are considered “unnecessary”, while still containing the full emotional information.
To test this assumption, we conducted several recognition experiments employing different datasets to verify the generalizability of this assumption. Depending on the dataset, we achieved performance gains between 0.94% and 4.86% absolute. Furthermore, we identified the features that are modified by the psychoacoustic modeling and confirmed by additional recognition experiments that the modification of these features is responsible for the observed performance increase. Although the feature influence is dataset specific, a small group of four low-level feature descriptors is shared amongst all three datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Albahri, A., Lech, M., Cheng, E.: Effect of speech compression on the automatic recognition of emotions. Int. J. Signal Process. Syst. 4(1), 55–61 (2016)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of German emotional speech. In: Proceedings of the INTERSPEECH 2005, Lisbon, Portugal, pp. 1517–1520 (2005)
Byrne, C., Foulkes, P.: The ‘mobile phone effect’ on vowel formants. Int. J. Speech Lang. Law 11(1), 83–102 (2004)
Böck, R., Egorow, O., Siegert, I., Wendemuth, A.: Comparative study on normalisation in emotion recognition from speech. In: Horain, P., Achard, C., Mallem, M. (eds.) IHCI 2017. LNCS, vol. 10688, pp. 189–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72038-8_15
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM 2010. p. s.p., Firenze, Italy (2010)
Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007). https://doi.org/10.1007/978-3-540-68888-4
García, N., Vásquez-Correa, J.C., Arias-Londoño, J.D., Várgas-Bonilla, J.F., Orozco-Arroyave, J.R.: Automatic emotion recognition in compressed speech using acoustic and non-linear features. In: 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA), Bogota, Colombia, pp. 1–7 (2015)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Hansen, J., Bou-Ghazale, S.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, vol. 4, pp. 1743–1746 (1997)
Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013). https://tools.ietf.org/html/draft-ietf-codec-results-03
Lefter, I., Nefs, H.T., Jonker, C.M., Rothkrantz, L.: Cross-corpus analysis for acoustic recognition of negative interactions. In: Proceedings of the 6th ACII, Xian, China, pp. 132–138 (2015)
Lotz, A.F., Siegert, I., Maruschke, M., Wendemuth, A.: Audio compression and its impact on emotion recognition in affective computing. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz, vol. 86, pp. 1–8. TUDpress, Saarbrücken (2017)
Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_69
Pan, D.: A tutorial on mpeg/audio compression. IEEE MultiMed. 2(2), 60–74 (1995)
Pfister, T., Robinson, P.: Speech emotion classification and public speaking skill assessment. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 151–162. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14715-9_15
Schuller, B., Müller, R., Hörnler, B., Höthker, A., Konosu, H., Rigoll, G.: Audiovisual recognition of spontaneous interest within conversations. In: Proceedings of the 9th ACM ICMI, pp. 30–37 (2007)
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE ASRU 2009, Merano, Italy, pp. 552–557 (2009)
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1, 119–131 (2010)
Siegert, I., Jokisch, O., Lotz, A.F., Trojahn, F., Meszaros, M., Maruschke, M.: Acoustic cues for the perceptual assessment of surround sound. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 65–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_6
Siegert, I., Lotz, A.F., Duong, L.L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Tagungsband der 27. Konferenz, vol. 81, pp. 229–236. TUDpress, Leipzig (2016)
Siegert, I., Lotz, A.F., Egorow, O., Wendemuth, A.: Improving speech-based emotion recognition by using psychoacoustic modeling and analysis-by-synthesis. In: Karpov, A., Potapova, R., Mporas, I. (eds.) SPECOM 2017. LNCS (LNAI), vol. 10458, pp. 445–455. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66429-3_44
Siegert, I., Lotz, A.F., Maruschke, M., Jokisch, O., Wendemuth, A.: Emotion intelligibility within codec-compressed and reduced bandwith speech. In: ITG-Fb. 267: Speech Communication: 12. ITG-Fachtagung Sprachkommunikation, pp. 215–219. VDE Verlag, Paderborn, October 2016
Tahon, M., Devillers, L.: Towards a small set of robust acoustic features for emotion recognition: challenges. EEE/ACM Trans. Audio Speech Lang. Process. 24(1), 16–28 (2016)
Tahon, M., Devillers, L.: Acoustic measures characterizing anger across corpora collected in artificial or natural context. In: International Conference on Speech Prosody (SP 2010), Chicago, USA, May 2010
Tickle, A., Raghu, S., Elshaw, M.: Emotional recognition from the speech signal for a virtual education agent. J. Phys. Conf. Ser. 450, 012053 (2013)
Valin, J.M., Terriberry, T.B., Montgomery, C., Maxwell, G.: A high-quality speech and audio codec with less than 10-ms delay. Trans. Audio Speech Lang. Process. 18(1), 58–67 (2010)
Valin, J.M., Vos, K., Terriberry, T.B.: Definition of the Opus Audio Codec. RFC 6716, RFC Editor, September 2012. https://tools.ietf.org/html/rfc6716
Xu, X., et al.: Survey on discriminative feature selection for speech emotion recognition. In: 9th ISCSLP, pp. 345–349 (2014)
Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: Proceedings of the IEEE ASRU 2011, Waikoloa, USA, pp. 523–528 (2011)
Acknowledgements
This work has further been sponsored by the German Federal Ministry of Education and Research in the program Zwanzig20 – Partnership for Innovation as part of the research alliance 3Dsensation. One of us (A.F. Lotz) wishes to acknowledge funding from the European Union’s Horizon 2020 research and innovation programme in the project “ADAS&Me” under grant agreement No. 68890.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Siegert, I., Lotz, A.F., Egorow, O., Wolff, S. (2018). Utilizing Psychoacoustic Modeling to Improve Speech-Based Emotion Recognition. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_64
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)