Skip to main content

A Novel Emotion Recognizer from Speech Using Both Prosodic and Linguistic Features

  • Conference paper
Knowledge-Based and Intelligent Information and Engineering Systems (KES 2011)

Abstract

Emotion recognition based on speech characteristics generally relies on prosodic information. However, utterances with different emotions in speech have similar prosodic features, so it is difficult to recognize emotion by using only prosodic features.

In this paper, we propose a novel approach to emotion recognition that considers both prosodic and linguistic features. First, possible emotions are output by clustering-based emotion recognizer, which only uses prosodic features. Then, subtitles given by the speech recognizer are input for another emotion recognizer based on the “Association Mechanism.” It outputs a possible emotion by using only linguistic information. Lastly, the intersection of the two sets of possible emotions is integrated into the final result.

Experimental results showed that the proposed method achieved higher performance than either prosodic- or linguistic-based emotion recognition. In a comparison with manually labeled data, the F-measure was 32.6%. On the other hand, the average of F-measures of labeled data given by other humans was 42.9%. This means that the proposed method performed at 75.9% in relation to human ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cambridge University Engineering Department: Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/

  2. Ekman, P.: Argument for basic emotions. In: Cognition and Emotion, pp. 169–200 (1992)

    Google Scholar 

  3. Eyben, F., Wollmer, M., Schuller, B.: openSMILE — Speech and music interpretation by large-space extraction, http://opensmile.sourceforget.net/

  4. Eyben, F., Wollmer, M., Schuller, B.: openEAR — Introducing the Munich open-source emotion and affect recognition toolkit. In: Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), vol. I, pp. 576–581 (2009)

    Google Scholar 

  5. Fujisaki, H.: Prosody, models, and spontaneous speech. In: Sagisaka, Y., Campbell, N., Higuhi, H. (eds.) Computing Prosody: Computational Models for Processing Spontaneous Speech, pp. 27–42. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  6. Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Communication 49(10-11), 787–800 (2007)

    Article  Google Scholar 

  7. Hirose, T., Watabe, H., Kawaoka, T.: Automatic refinement method of concept-base considering the rule between concepts and frequency of appearance as an attribute. Technical Report of IEICE NLC2001-93, The institute of Electronics, Information and Communication Engineers (2002) (in Japanese)

    Google Scholar 

  8. Horiguchi, A., Tsuchiya, S., Kojima, K., Watabe, H., Kawaoka, T.: Constructing a sensuous judgment system based on conceptual processing. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 86–95. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Mera, K., Ichimura, T., Aizawa, T., Yamashita, T.: Invoking emotions in a dialog system based on word-impressions. Transactions of the Japanese Society for Artificial Intelligence 17(3), 186–195 (2002) (in Japanese)

    Article  Google Scholar 

  10. Kojima, K., Watabe, H., Kawaoka, T.: A method of a concept-base construction for an association system: Deciding attribute weights based on the degree of attribute reliability. Journal of Natural Language Processing 9(5), 93–110 (2002) (in Japanese)

    Article  Google Scholar 

  11. Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. Eurospeech, pp. 1691–1694 (2001)

    Google Scholar 

  12. Lee, C.M., Narayanan, S.S., Pieraccini, R.: Combining acoustic and language information for emotion recognition. In: Proc. ICSLP 2002 (2002)

    Google Scholar 

  13. Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Communication 28(1), 84–95 (1980)

    Article  Google Scholar 

  14. Luengo, I., Navas, E., Hernaez, I.: Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 Emotion Challenge. In: Proc. Interspeech, pp. 332–335 (2009)

    Google Scholar 

  15. Maekawa, K.: Corpus of spontaneous Japanese: Its design and evaluation. In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, SSPR (2003)

    Google Scholar 

  16. Matsumoto, K., Mishina, K., Ren, F., Kuroiwa, S.: Emotion estimation algorithm based on emotion occurrence sentence pattern. Journal of Natural Language Processing 14(3), 239–271 (2007) (in Japanese)

    Article  Google Scholar 

  17. Rigoll, G., Muller, R., Schuller, B.: Speech emotion recognition exploiting acoustic and linguistic information sources. In: Proc. SPECOM 2005, vol. 1, pp. 61–67 (2005)

    Google Scholar 

  18. Schuller, B., Muller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc. Interspeech, pp. 805–808 (2005)

    Google Scholar 

  19. Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine — belief network architecture. In: Proc. ICASSP 2004, vol. 1, pp. 577–580 (2004)

    Google Scholar 

  20. Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Proc. Interspeech 2009, pp. 312–315 (2009)

    Google Scholar 

  21. Schuller, B., Villar, R.J., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proc. ICASSP 2005, vol. 1, pp. 325–329 (2005)

    Google Scholar 

  22. Tokuhisa, M., Okada, N.: A pattern comprehension approach to emotion arousal of an intelligent agent. Transactions of Information Processing Society of Japan 39(8), 2440–2451 (1998) (in Japanese)

    Google Scholar 

  23. Tsuchiya, S., Yoshimura, E., Ren, F., Watabe, H.: Emotion judgment based on relationship between speaker and sentential actor. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5711, pp. 62–69. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  24. Tsuchiya, S., Yoshimura, E., Watabe, H.: Emotion judgment method from an utterance sentence. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6279, pp. 1–10. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  25. Tsuchiya, S., Yoshimura, E., Watabe, H., Kawaoka, T.: The method of the emotion judgment based on an association mechanism. Journal of Natural Language Processing 14(3), 119–238 (2007) (in Japanese)

    Article  Google Scholar 

  26. Watabe, H., Horiguchi, A., Kawaoka, T.: A sense retrieving method from a noun for the commonsense feeling judgment system. Journal of Artificial Intelligence 19(2), 73–82 (2004) (in Japanese)

    Google Scholar 

  27. Watabe, H., Kawaoka, T.: Measuring degree of association between concepts for commonsense judgments. Journal of Natural Language Processing 8(2), 39–54 (2001) (in Japanese)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suzuki, M., Tsuchiya, S., Ren, F. (2011). A Novel Emotion Recognizer from Speech Using Both Prosodic and Linguistic Features. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23851-2_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23850-5

  • Online ISBN: 978-3-642-23851-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics