Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition

Sidorov, Maxim; Brester, Christina; Ultes, Stefan; Schmitt, Alexander

doi:10.1007/978-981-10-2585-3_12

Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition

Maxim Sidorov³,
Christina Brester⁴,
Stefan Ultes³ &
…
Alexander Schmitt³

Chapter
First Online: 25 December 2016

1482 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

Abstract

While approaches on automatic recognition of human emotion from speech have already achieved reasonable results , a lot of room for improvement still remains there. In our research, we select the most essential features by applying a self-adaptive multi-objective genetic algorithm. The proposed approach is evaluated using data from different languages (English and German) with two different feature sets consisting of 37 and 384 dimensions, respectively. The obtained results of the developed technique have increased the emotion recognition performance by up to 49.8 % relative improvement in accuracy. Furthermore, in order to identify salient features across speech data from different languages, we analysed the selection count of the features to generate a feature ranking. Based on this, a feature set for speech-based emotion recognition based on the most salient features has been created. By applying this feature set, we achieve a relative improvement of up to 37.3 % without the need of time-consuming feature selection using a genetic algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bellman, R.: Dynamic Programming. Princeton University Press (1957)
Google Scholar
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of the INTERSPEECH (2003)
Google Scholar
Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH 97, 1743–1746 (1997)
Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M.J., Wong, M.: “you stupid tin box”—children interacting with the Aibo robot: A cross-linguistic emotional speech corpus. In: Proceedings of LREC (2004)
Google Scholar
Gharavian, D., Sheikhan, M., Nazerieh, A., Garoucy, S.: Speech emotion recognition using fcbf feature selection method and ga-optimized fuzzy artmap neural network. Neural Comput. Appl. 21(8), 2115–2126 (2012)
Article Google Scholar
Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3(5), 698–713 (1992)
Article Google Scholar
Bijankhan, M., Sheikhzadegan, J., Roohani, M., Samareh, Y., Lucas, C., Tebyani, M.: Farsdat-the speech database of Farsi spoken language. In: Proceedings of the Australian Conference on Speech Science and Technology. vol. 2, pp. 826–830 (1994)
Google Scholar
Polzehl, T., Schmitt, A., Metze, F.: Salient features for anger recognition in German and English IVR portals. In: Minker, W., Lee, G.G., Nakamura, S., Mariani, J. (eds.) Spoken Dialogue Systems Technology and Design, pp. 83–105. Springer, New York (2011). doi:10.1007/978-1-4419-7934-6_4
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceeding of Interspeech. pp. 1517–1520 (2005)
Google Scholar
Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)
Google Scholar
Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 865–868. IEEE (2008)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2009, ASRU 2009, pp. 552–557. IEEE (2009)
Google Scholar
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)
Article Google Scholar
Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)
Article Google Scholar
Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 3(2), 124–141 (1999)
Article Google Scholar
Daridi, F., Kharma, N., Salik, J.: Parameterless genetic algorithms: review and innovation. IEEE Can. Rev. 47, 19–23 (2004)
Google Scholar
Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Parallel Problem Solving from Nature–PPSN III, pp. 249–257. Springer (1994)
Google Scholar
Zhang, Q., Zhou, A., Zhao, S., Suganthan, P.N., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the CEC 2009 special session and competition. University of Essex, Colchester, UK and Nanyang Technological University, Singapore, Special Session on Performance Assessment of Multi-Objective Optimization Algorithms, Technical Report (2008)
Google Scholar
Kockmann, M., Burget, L., Černockỳ, J.: Brno University of Technology system for Interspeech 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)
Google Scholar
Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2002)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Ulm University, Ulm, Germany
Maxim Sidorov, Stefan Ultes & Alexander Schmitt
Siberian State Aerospace University, Krasnoyarsk, Russia
Christina Brester

Authors

Maxim Sidorov
View author publications
You can also search for this author in PubMed Google Scholar
Christina Brester
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Ultes
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Schmitt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxim Sidorov .

Editor information

Editors and Affiliations

Institute of Behavioural Sciences, University of Helsinki Institute of Behavioural Sciences, Helsinki, Finland
Kristiina Jokinen
University of Helsinki , Helsinki, Finland
Graham Wilcock

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sidorov, M., Brester, C., Ultes, S., Schmitt, A. (2017). Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_12

Download citation

DOI: https://doi.org/10.1007/978-981-10-2585-3_12
Published: 25 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2584-6
Online ISBN: 978-981-10-2585-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics