Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

Zhang, Shiqing

doi:10.1007/978-3-540-87734-9_52

Shiqing Zhang⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5264))

Included in the following conference series:

International Symposium on Neural Networks

3108 Accesses
22 Citations

Abstract

Nowadays, recognition of human emotion is a challenging yet important speech technology. In this paper, based on deriving prosody features from emotional speech, some voice quality features are proposed to be extracted as new emotional features to improve emotion recognition. Utilizing support vector machines classifier, four emotions from Chinese natural emotional speech corpus including anger, joy, sadness and neutral are discriminated by combining prosody and voice quality features. The experiment results show that combining prosody and voice quality features yields an overall accuracy of 76% for emotion recognition, which makes approximately 10% improvement compared with using the single prosody features. It also shows that voice quality features in speech are effective emotional features and can promote prosody features for improving emotion recognition results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18(01), 32–80 (2001)
Article Google Scholar
Lee, C.M., Narayanan, S.S.: Toward Detecting Emotions in Spoken Dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)
Article Google Scholar
Nakatsu, R., Nicholson, J., Tosa, N.: Emotion Recognition and Its Application to Computer Agents with Spontaneous Interactive Capabilities. Knowledge-Based Systems 13(7-8), 497–504 (2000)
Article Google Scholar
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2037–2039 (2002)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-Based Speech Emotion Recognition. In: Proceedings of the ICASSP, Hong Kong, vol. 2, pp. 1–4 (2003)
Google Scholar
Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional Space Improves Emotion Recognition. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2029–2032 (2002)
Google Scholar
Gobl, C., Ni-Chasaide, A.: The Role of Voice Quality in Communicating Emotion, Mood, and Attitude. Speech Communication 40, 189–212 (2003)
Article MATH Google Scholar
Johnstone, T., Scherer, K.R.: The Effects of Emotions on Voice Quality. In: Proceedings of the XIVth International Congress of Phonetic Science, San Francisco, pp. 2029–2032 (1999)
Google Scholar
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Taizhou University, 318000, Taizhou, China
Shiqing Zhang

Authors

Shiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghu University, 100084, Beijing, China
Fuchun Sun
Institute TAMS (Technical Aspects of Multimodal Systems), department of Informatics, University of Hamburg, Vogt-Koelln-Straße 30, 22527, Hamburg, Germany
Jianwei Zhang
Intel China Research Center, 8/F, Peking University, Department of Machine Intelligence, 100871, Beijing, China
Ying Tan
Department of Mathematics, Southeast University, 210096, Nanjing, China
Jinde Cao
Departamento de Control Automático, CINVESTAV-IPN, A.P. 14-740, Av.IPN 2508, 07360, México D.F., México
Wen Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S. (2008). Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-87734-9_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87733-2
Online ISBN: 978-3-540-87734-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics