A Novel Emotion Recognizer from Speech Using Both Prosodic and Linguistic Features

Suzuki, Motoyuki; Tsuchiya, Seiji; Ren, Fuji

doi:10.1007/978-3-642-23851-2_47

Motoyuki Suzuki²⁵,
Seiji Tsuchiya²⁶ &
Fuji Ren²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6881))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1319 Accesses
1 Citations

Abstract

Emotion recognition based on speech characteristics generally relies on prosodic information. However, utterances with different emotions in speech have similar prosodic features, so it is difficult to recognize emotion by using only prosodic features.

In this paper, we propose a novel approach to emotion recognition that considers both prosodic and linguistic features. First, possible emotions are output by clustering-based emotion recognizer, which only uses prosodic features. Then, subtitles given by the speech recognizer are input for another emotion recognizer based on the “Association Mechanism.” It outputs a possible emotion by using only linguistic information. Lastly, the intersection of the two sets of possible emotions is integrated into the final result.

Experimental results showed that the proposed method achieved higher performance than either prosodic- or linguistic-based emotion recognition. In a comparison with manually labeled data, the F-measure was 32.6%. On the other hand, the average of F-measures of labeled data given by other humans was 42.9%. This means that the proposed method performed at 75.9% in relation to human ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cambridge University Engineering Department: Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/
Ekman, P.: Argument for basic emotions. In: Cognition and Emotion, pp. 169–200 (1992)
Google Scholar
Eyben, F., Wollmer, M., Schuller, B.: openSMILE — Speech and music interpretation by large-space extraction, http://opensmile.sourceforget.net/
Eyben, F., Wollmer, M., Schuller, B.: openEAR — Introducing the Munich open-source emotion and affect recognition toolkit. In: Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), vol. I, pp. 576–581 (2009)
Google Scholar
Fujisaki, H.: Prosody, models, and spontaneous speech. In: Sagisaka, Y., Campbell, N., Higuhi, H. (eds.) Computing Prosody: Computational Models for Processing Spontaneous Speech, pp. 27–42. Springer, Heidelberg (1997)
Chapter Google Scholar
Grimm, M., Kroschel, K., Mower, E., Narayanan, S.: Primitives-based evaluation and estimation of emotions in speech. Speech Communication 49(10-11), 787–800 (2007)
Article Google Scholar
Hirose, T., Watabe, H., Kawaoka, T.: Automatic refinement method of concept-base considering the rule between concepts and frequency of appearance as an attribute. Technical Report of IEICE NLC2001-93, The institute of Electronics, Information and Communication Engineers (2002) (in Japanese)
Google Scholar
Horiguchi, A., Tsuchiya, S., Kojima, K., Watabe, H., Kawaoka, T.: Constructing a sensuous judgment system based on conceptual processing. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 86–95. Springer, Heidelberg (2002)
Chapter Google Scholar
Mera, K., Ichimura, T., Aizawa, T., Yamashita, T.: Invoking emotions in a dialog system based on word-impressions. Transactions of the Japanese Society for Artificial Intelligence 17(3), 186–195 (2002) (in Japanese)
Article Google Scholar
Kojima, K., Watabe, H., Kawaoka, T.: A method of a concept-base construction for an association system: Deciding attribute weights based on the degree of attribute reliability. Journal of Natural Language Processing 9(5), 93–110 (2002) (in Japanese)
Article Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. Eurospeech, pp. 1691–1694 (2001)
Google Scholar
Lee, C.M., Narayanan, S.S., Pieraccini, R.: Combining acoustic and language information for emotion recognition. In: Proc. ICSLP 2002 (2002)
Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Communication 28(1), 84–95 (1980)
Article Google Scholar
Luengo, I., Navas, E., Hernaez, I.: Combining spectral and prosodic information for emotion recognition in the Interspeech 2009 Emotion Challenge. In: Proc. Interspeech, pp. 332–335 (2009)
Google Scholar
Maekawa, K.: Corpus of spontaneous Japanese: Its design and evaluation. In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, SSPR (2003)
Google Scholar
Matsumoto, K., Mishina, K., Ren, F., Kuroiwa, S.: Emotion estimation algorithm based on emotion occurrence sentence pattern. Journal of Natural Language Processing 14(3), 239–271 (2007) (in Japanese)
Article Google Scholar
Rigoll, G., Muller, R., Schuller, B.: Speech emotion recognition exploiting acoustic and linguistic information sources. In: Proc. SPECOM 2005, vol. 1, pp. 61–67 (2005)
Google Scholar
Schuller, B., Muller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc. Interspeech, pp. 805–808 (2005)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine — belief network architecture. In: Proc. ICASSP 2004, vol. 1, pp. 577–580 (2004)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Proc. Interspeech 2009, pp. 312–315 (2009)
Google Scholar
Schuller, B., Villar, R.J., Rigoll, G., Lang, M.: Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In: Proc. ICASSP 2005, vol. 1, pp. 325–329 (2005)
Google Scholar
Tokuhisa, M., Okada, N.: A pattern comprehension approach to emotion arousal of an intelligent agent. Transactions of Information Processing Society of Japan 39(8), 2440–2451 (1998) (in Japanese)
Google Scholar
Tsuchiya, S., Yoshimura, E., Ren, F., Watabe, H.: Emotion judgment based on relationship between speaker and sentential actor. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5711, pp. 62–69. Springer, Heidelberg (2009)
Chapter Google Scholar
Tsuchiya, S., Yoshimura, E., Watabe, H.: Emotion judgment method from an utterance sentence. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6279, pp. 1–10. Springer, Heidelberg (2010)
Chapter Google Scholar
Tsuchiya, S., Yoshimura, E., Watabe, H., Kawaoka, T.: The method of the emotion judgment based on an association mechanism. Journal of Natural Language Processing 14(3), 119–238 (2007) (in Japanese)
Article Google Scholar
Watabe, H., Horiguchi, A., Kawaoka, T.: A sense retrieving method from a noun for the commonsense feeling judgment system. Journal of Artificial Intelligence 19(2), 73–82 (2004) (in Japanese)
Google Scholar
Watabe, H., Kawaoka, T.: Measuring degree of association between concepts for commonsense judgments. Journal of Natural Language Processing 8(2), 39–54 (2001) (in Japanese)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Technology and Science, The University of Tokushima, 2-1 Minamijosanjima-cho, Tokushima, 770-8506, Japan
Motoyuki Suzuki & Fuji Ren
Department of Intelligent Information Engineering and Sciences, Doshisha University, Kyotanabe, Kyoto, 610-0394, Japan
Seiji Tsuchiya

Authors

Motoyuki Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Seiji Tsuchiya
View author publications
You can also search for this author in PubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Integrated Sensor Systems, University of Kaiserslautern, Erwin-Schroedinger-str. 12, 67663, Kaiserslautern, Germany
Andreas König
Knowledge-Based Systems Group, Department of omputer Science, University of Kaiserslautern, P.O. Box 3049, 67653, Kaiserslautern, Germany
Andreas Dengel
School of Business, University of Applied Sciences Northwestern Switzerland, Riggenbachstr. 16, 4600, Olten, Switzerland
Knut Hinkelmann
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, 599-8531, Sakai, Osaka, Japan
Koichi Kise
KES International, P.O. Box 2115, BN43 9AF, Shoreham-by-sea, UK
Robert J. Howlett
University of South Australia, Mawson Lakes, 5095, Adelaide, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, M., Tsuchiya, S., Ren, F. (2011). A Novel Emotion Recognizer from Speech Using Both Prosodic and Linguistic Features. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2011. Lecture Notes in Computer Science(), vol 6881. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23851-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-23851-2_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23850-5
Online ISBN: 978-3-642-23851-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics