Abstract
This paper presents a hybrid neural approach to emotion recognition from speech, which combines feature selection using principal component analysis (PCA) with unsupervised neural clustering through self-organising map (SOM). Given the importance that is associated with emotions in humans, it is unlikely that robots will be accepted as anything more that machines if they do not express and recognise emotions. In this paper, we describe the performance of an unsupervised approach to emotion recognition that achieves similar performance to current supervised intelligent approaches. Performance, however, reduces when the system is tested using samples from a male volunteer not in the training set using a low cost microphone. Through the use of an unsupervised neural approach, it is possible to go beyond the basic binary classification of emotions to consider the similarity between emotions and whether speech can express multiple emotions at the same time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abdi, H., Williams, L.J.: Principal Component Analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 433–459 (2010)
Attias, H.: Learning in High Dimensions: Modular Mixture Models. Microsoft Research, USA (2001)
Breazeal, C.: The Role of Expression in Robots that Learn from People. Phil. Trans. R. Soc. B 364(1535), 3527–3538 (2009)
Burkhardt, F., Paeschke, A., Rolfe, M., Sendlmeier, W., Weis, B.: A Database of German Emotional Speech. In: Interspeech, Lisbon (2005)
Doya, K.: What are the Computations of the Cerebellum, the Basal Ganglia and the cerebral cortex? Neural Networks 12(7-8), 961–974 (1999)
Elshaw, M., Moore, R.K., Klein, M.: An Attention-gating Recurrent Working Memory Architecture for Emergent Speech Representation. Connection Science 22(2), 157–175 (2010)
Eyben, F., Woellmer, M., Schuller, B.: openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. ACM Multimedia, 1459–1462 (2010)
Hall, M.: Correlation-based Feature Selection for Machine Learning (1999)
Haykin, S.: Neural Networks: A Comprehensive Foundation, Toronto, Canada. Macmillian College Publishing Company (1994)
Holmes, J., Holmes, W.: Speech Synthesis and Recognition. Taylor and Francis, London (2001)
Huang, Y., Zhang, G., Xu, X.: Speech Emotion Recognition Research Based on the Stacked Generalization Ensemble Neural Network for Robot Pet. In: Pattern Recognition, CCPR, pp. 1–5 (2009)
Kohonen, T.: Self-Organization of Topologically Correct Feature Maps. Biological Cybernetics 43, 59–69 (1982)
Mehrabian, A.: Pleasure-Arousal-Dominance: A General Framework for Describing and Measuring Individual Differences in Temperament. Current Psychology 14(4), 261–292 (1996)
Pan, Y., Shen, P., Shen, L.: Speech Emotion Recognition Using Support Vector Machine. International Journal of Smart Home 6(2), 101–107 (2012)
Shami, M., Verhelst, W.: An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49(3) (2007)
Slavova, V., Verhelst, W., Sahli, H.: A Cognitive Science Reasoning in Recognition of Emotions in Audio-Visual Speech. International Journal Information Technologies and Knowledge 2, 324–334 (2008)
Sobin, C., Alpert, M.: Emotion in Speech: The Acoustic Attributes of Fear, Anger, Sadness, and Joy. Journal of Psycholinguistic Research 28(4), 347–365 (1999)
ten Bosch, L., Van Hamme, H., Boves, L., Moore, R.K.: A computational model of language acquisition: the emergence of words. Fundamenta Informaticae 90, 229–249 (2009)
Traunmüller, H., Eriksson, A.: The Frequency Range of the Voice Fundamental in the Speech of Male and Female Adults. Department of Linguistics, University of Stockholm, Stockholm (1994)
Vogt, T., André, E., Bee, N.: EmoVoice — A Framework for Online Recognition of Emotions from Voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 188–199. Springer, Heidelberg (2008)
Zhang, G., Song, Q., Fei, S.: Speech Emotion Recognition System Based on BP Neural Network in Matlab Environment. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds.) ISNN 2008, Part II. LNCS, vol. 5264, pp. 801–808. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Traista, A., Elshaw, M. (2012). A Hybrid Neural Emotion Recogniser for Human-Robotic Agent Interaction. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-32909-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32908-1
Online ISBN: 978-3-642-32909-8
eBook Packages: Computer ScienceComputer Science (R0)