Skip to main content

Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition

  • Chapter
  • First Online:
  • 1482 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

Abstract

While approaches on automatic recognition of human emotion from speech have already achieved reasonable results , a lot of room for improvement still remains there. In our research, we select the most essential features by applying a self-adaptive multi-objective genetic algorithm. The proposed approach is evaluated using data from different languages (English and German) with two different feature sets consisting of 37 and 384 dimensions, respectively. The obtained results of the developed technique have increased the emotion recognition performance by up to 49.8 % relative improvement in accuracy. Furthermore, in order to identify salient features across speech data from different languages, we analysed the selection count of the features to generate a feature ranking. Based on this, a feature set for speech-based emotion recognition based on the most salient features has been created. By applying this feature set, we achieve a relative improvement of up to 37.3 % without the need of time-consuming feature selection using a genetic algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bellman, R.: Dynamic Programming. Princeton University Press (1957)

    Google Scholar 

  2. Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of the INTERSPEECH (2003)

    Google Scholar 

  3. Hansen, J.H., Bou-Ghazale, S.E., Sarikaya, R., Pellom, B.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Proceedings of the EUROSPEECH 97, 1743–1746 (1997)

    Google Scholar 

  4. Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M.J., Wong, M.: “you stupid tin box”—children interacting with the Aibo robot: A cross-linguistic emotional speech corpus. In: Proceedings of LREC (2004)

    Google Scholar 

  5. Gharavian, D., Sheikhan, M., Nazerieh, A., Garoucy, S.: Speech emotion recognition using fcbf feature selection method and ga-optimized fuzzy artmap neural network. Neural Comput. Appl. 21(8), 2115–2126 (2012)

    Article  Google Scholar 

  6. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3(5), 698–713 (1992)

    Article  Google Scholar 

  7. Bijankhan, M., Sheikhzadegan, J., Roohani, M., Samareh, Y., Lucas, C., Tebyani, M.: Farsdat-the speech database of Farsi spoken language. In: Proceedings of the Australian Conference on Speech Science and Technology. vol. 2, pp. 826–830 (1994)

    Google Scholar 

  8. Polzehl, T., Schmitt, A., Metze, F.: Salient features for anger recognition in German and English IVR portals. In: Minker, W., Lee, G.G., Nakamura, S., Mariani, J. (eds.) Spoken Dialogue Systems Technology and Design, pp. 83–105. Springer, New York (2011). doi:10.1007/978-1-4419-7934-6_4

  9. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceeding of Interspeech. pp. 1517–1520 (2005)

    Google Scholar 

  10. Haq, S., Jackson, P.: Machine Audition: Principles, Algorithms and Systems, chap. Multimodal Emotion Recognition, pp. 398–423. IGI Global, Hershey PA (Aug 2010)

    Google Scholar 

  11. Grimm, M., Kroschel, K., Narayanan, S.: The Vera am Mittag German audio-visual emotional speech database. In: Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, pp. 865–868. IEEE (2008)

    Google Scholar 

  12. Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., Wendemuth, A.: Acoustic emotion recognition: a benchmark comparison of performances. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2009, ASRU 2009, pp. 552–557. IEEE (2009)

    Google Scholar 

  13. Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)

    Article  Google Scholar 

  14. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999)

    Article  Google Scholar 

  15. Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 3(2), 124–141 (1999)

    Article  Google Scholar 

  16. Daridi, F., Kharma, N., Salik, J.: Parameterless genetic algorithms: review and innovation. IEEE Can. Rev. 47, 19–23 (2004)

    Google Scholar 

  17. Potter, M.A., De Jong, K.A.: A cooperative coevolutionary approach to function optimization. In: Parallel Problem Solving from Nature–PPSN III, pp. 249–257. Springer (1994)

    Google Scholar 

  18. Zhang, Q., Zhou, A., Zhao, S., Suganthan, P.N., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the CEC 2009 special session and competition. University of Essex, Colchester, UK and Nanyang Technological University, Singapore, Special Session on Performance Assessment of Multi-Objective Optimization Algorithms, Technical Report (2008)

    Google Scholar 

  19. Kockmann, M., Burget, L., Černockỳ, J.: Brno University of Technology system for Interspeech 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)

    Google Scholar 

  20. Boersma, P.: Praat, a system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2002)

    Google Scholar 

  21. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on Multimedia, pp. 1459–1462. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxim Sidorov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Sidorov, M., Brester, C., Ultes, S., Schmitt, A. (2017). Salient Cross-Lingual Acoustic and Prosodic Features for English and German Emotion Recognition. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2585-3_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2584-6

  • Online ISBN: 978-981-10-2585-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics