Abstract
Emotional content analysis is getting more and more present in speech-based human machine interaction, such as emotion recognition and expressive speech synthesis. In this framework, this paper aims to compare the emotional content of a pair of speech signals, uttered by different speakers and not necessarily having the same text. This exploratory work employs machine learning methods to analyze emotional content in speech from different angles: (a) Evaluate the relevance of the used features in the analysis of emotions, (b) Calculate the similarity of the emotional content independently from speakers and text. The final goal is to provide a metric to compare emotional content in speech. Such a metric would form the basis for higher-level tasks, such as clustering utterances by emotional content, or applying kernel methods for expressive speech analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., et al.: Combining efforts for improving automatic classification of emotional user states. In: Proceedings of 5th Slovenian and 1st International Language Technologies Conference (IS LTC 2006), Ljubljana, Slovenia (2006)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Dig. Signal Process. 22(6), 1154–1160 (2012)
Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 511–516. IEEE (2013)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, New York (2001)
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
Eyben, F., Buchholz, S., Braunschweiler, N., Latorre, J., Wan, V., Gales, M.J., Knill, K.: Unsupervised clustering of emotion and voice styles for expressive TTS. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2012), pp. 4009–4012. IEEE (2012)
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Hansen, J.H., Bou-Ghazale, S.E.: Getting started with SUSAS: a speech under simulated and actual stress database. In: Fifth European Conference on Speech Communication and Technology (1997)
Hozjan, V., Kačič, Z.: Context-independent multilingual emotion recognition from speech signals. Int. J. Speech Technol. 6(3), 311–320 (2003)
Huang, Z.: An investigation of emotion changes from speech. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 733–736. IEEE (2015)
Kadiri, S.R., Gangamohan, P., Gangashetty, S.V., Yegnanarayana, B.: Analysis of excitation source features of speech for emotion recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Kim, J., Saurous, R.: Emotion recognition from human speech using temporal information and deep learning. In: Annual Conference of the International Speech Communication Association, Interspeech 2018 (2018)
Mao, X., Chen, L., Fu, L.: Multi-level speech emotion recognition based on HMM and ANN. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 225–229. IEEE (2009)
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Comput. Appl. 9(4), 290–296 (2000)
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Plutchik, R.: The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Schuller, B., Rigoll, G.: Recognising interest in conversational speech-comparing bag of frames and supra-segmental features. In: Proceedings of Interspeech 2009, Brighton, UK, pp. 1999–2002 (2009)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: 2003 Proceedings of Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. II–1. IEEE (2003)
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: 2004 Proceedings of Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, pp. I–577. IEEE (2004)
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., Narayanan, S.: The INTERSPEECH 2010 paralinguistic challenge. In: 2010 Proceedings of INTERSPEECH, Makuhari, Japan, pp. 2794–2797 (2010)
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France (2013)
Székely, E., Cabral, J.P., Cahill, P., Carson-Berndsen, J.: Clustering expressive speech styles in audiobooks using glottal source parameters. In: 12th Annual Conference of the International Speech Communication Association (2011)
Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Speech Commun. 48(9), 1162–1181 (2006)
Wu, D., Parsons, T.D., Narayanan, S.S.: Acoustic feature analysis in speech emotion primitives estimation. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
Acknowlegments
This work was supported by the research grant “Fondi di ricerca di ateneo 2016” of the university of Genova.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mnasri, Z., Rovetta, S., Masulli, F. (2020). Feature Analysis for Emotional Content Comparison in Speech. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-030-29933-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)