Abstract
Emotion recognition based on speech characteristics generally relies on prosodic information. However, utterances with different emotions in speech have similar prosodic features, so it is difficult to recognize emotion by using only prosodic features.
In this paper, we propose a novel approach to emotion recognition that considers both prosodic and linguistic features. First, possible emotions are output by clustering-based emotion recognizer, which only uses prosodic features. Then, subtitles given by the speech recognizer are input for another emotion recognizer based on the “Association Mechanism.” It outputs a possible emotion by using only linguistic information. Lastly, the intersection of the two sets of possible emotions is integrated into the final result.
Experimental results showed that the proposed method achieved higher performance than either prosodic- or linguistic-based emotion recognition. In a comparison with manually labeled data, the F-measure was 32.6%. On the other hand, the average of F-measures of labeled data given by other humans was 42.9%. This means that the proposed method performed at 75.9% in relation to human ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cambridge University Engineering Department: Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/
Ekman, P.: Argument for basic emotions. In: Cognition and Emotion, pp. 169–200 (1992)
Eyben, F., Wollmer, M., Schuller, B.: openSMILE — Speech and music interpretation by large-space extraction, http://opensmile.sourceforget.net/
Eyben, F., Wollmer, M., Schuller, B.: openEAR — Introducing the Munich open-source emotion and affect recognition toolkit. In: Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), vol. I, pp. 576–581 (2009)
Fujisaki, H.: Prosody, models, and spontaneous speech. In: Sagisaka, Y., Campbell, N., Higuhi, H. (eds.) Computing Prosody: Computational Models for Processing Spontaneous Speech, pp. 27–42. Springer, Heidelberg (1997)
Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Communication 49(10-11), 787–800 (2007)
Hirose, T., Watabe, H., Kawaoka, T.: Automatic refinement method of concept-base considering the rule between concepts and frequency of appearance as an attribute. Technical Report of IEICE NLC2001-93, The institute of Electronics, Information and Communication Engineers (2002) (in Japanese)
Horiguchi, A., Tsuchiya, S., Kojima, K., Watabe, H., Kawaoka, T.: Constructing a sensuous judgment system based on conceptual processing. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 86–95. Springer, Heidelberg (2002)
Mera, K., Ichimura, T., Aizawa, T., Yamashita, T.: Invoking emotions in a dialog system based on word-impressions. Transactions of the Japanese Society for Artificial Intelligence 17(3), 186–195 (2002) (in Japanese)
Kojima, K., Watabe, H., Kawaoka, T.: A method of a concept-base construction for an association system: Deciding attribute weights based on the degree of attribute reliability. Journal of Natural Language Processing 9(5), 93–110 (2002) (in Japanese)
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. Eurospeech, pp. 1691–1694 (2001)
Lee, C.M., Narayanan, S.S., Pieraccini, R.: Combining acoustic and language information for emotion recognition. In: Proc. ICSLP 2002 (2002)
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Communication 28(1), 84–95 (1980)
Luengo, I., Navas, E., Hernaez, I.: Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 Emotion Challenge. In: Proc. Interspeech, pp. 332–335 (2009)
Maekawa, K.: Corpus of spontaneous Japanese: Its design and evaluation. In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, SSPR (2003)
Matsumoto, K., Mishina, K., Ren, F., Kuroiwa, S.: Emotion estimation algorithm based on emotion occurrence sentence pattern. Journal of Natural Language Processing 14(3), 239–271 (2007) (in Japanese)
Rigoll, G., Muller, R., Schuller, B.: Speech emotion recognition exploiting acoustic and linguistic information sources. In: Proc. SPECOM 2005, vol. 1, pp. 61–67 (2005)
Schuller, B., Muller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc. Interspeech, pp. 805–808 (2005)
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine — belief network architecture. In: Proc. ICASSP 2004, vol. 1, pp. 577–580 (2004)
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Proc. Interspeech 2009, pp. 312–315 (2009)
Schuller, B., Villar, R.J., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proc. ICASSP 2005, vol. 1, pp. 325–329 (2005)
Tokuhisa, M., Okada, N.: A pattern comprehension approach to emotion arousal of an intelligent agent. Transactions of Information Processing Society of Japan 39(8), 2440–2451 (1998) (in Japanese)
Tsuchiya, S., Yoshimura, E., Ren, F., Watabe, H.: Emotion judgment based on relationship between speaker and sentential actor. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5711, pp. 62–69. Springer, Heidelberg (2009)
Tsuchiya, S., Yoshimura, E., Watabe, H.: Emotion judgment method from an utterance sentence. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6279, pp. 1–10. Springer, Heidelberg (2010)
Tsuchiya, S., Yoshimura, E., Watabe, H., Kawaoka, T.: The method of the emotion judgment based on an association mechanism. Journal of Natural Language Processing 14(3), 119–238 (2007) (in Japanese)
Watabe, H., Horiguchi, A., Kawaoka, T.: A sense retrieving method from a noun for the commonsense feeling judgment system. Journal of Artificial Intelligence 19(2), 73–82 (2004) (in Japanese)
Watabe, H., Kawaoka, T.: Measuring degree of association between concepts for commonsense judgments. Journal of Natural Language Processing 8(2), 39–54 (2001) (in Japanese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suzuki, M., Tsuchiya, S., Ren, F. (2011). A Novel Emotion Recognizer from Speech Using Both Prosodic and Linguistic Features. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-23851-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23850-5
Online ISBN: 978-3-642-23851-2
eBook Packages: Computer ScienceComputer Science (R0)