Skip to main content

Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

  • Conference paper
Book cover Advances in Neural Networks - ISNN 2008 (ISNN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5264))

Included in the following conference series:

Abstract

Nowadays, recognition of human emotion is a challenging yet important speech technology. In this paper, based on deriving prosody features from emotional speech, some voice quality features are proposed to be extracted as new emotional features to improve emotion recognition. Utilizing support vector machines classifier, four emotions from Chinese natural emotional speech corpus including anger, joy, sadness and neutral are discriminated by combining prosody and voice quality features. The experiment results show that combining prosody and voice quality features yields an overall accuracy of 76% for emotion recognition, which makes approximately 10% improvement compared with using the single prosody features. It also shows that voice quality features in speech are effective emotional features and can promote prosody features for improving emotion recognition results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18(01), 32–80 (2001)

    Article  Google Scholar 

  2. Lee, C.M., Narayanan, S.S.: Toward Detecting Emotions in Spoken Dialogs. IEEE Transactions on Speech and Audio Processing 13(2), 293–303 (2005)

    Article  Google Scholar 

  3. Nakatsu, R., Nicholson, J., Tosa, N.: Emotion Recognition and Its Application to Computer Agents with Spontaneous Interactive Capabilities. Knowledge-Based Systems 13(7-8), 497–504 (2000)

    Article  Google Scholar 

  4. Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-Based Automatic Detection of Annoyance and Frustration in Human-Computer Dialog. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2037–2039 (2002)

    Google Scholar 

  5. Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-Based Speech Emotion Recognition. In: Proceedings of the ICASSP, Hong Kong, vol. 2, pp. 1–4 (2003)

    Google Scholar 

  6. Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional Space Improves Emotion Recognition. In: Proceedings of the ICSLP, Denver, Colorado, pp. 2029–2032 (2002)

    Google Scholar 

  7. Gobl, C., Ni-Chasaide, A.: The Role of Voice Quality in Communicating Emotion, Mood, and Attitude. Speech Communication 40, 189–212 (2003)

    Article  MATH  Google Scholar 

  8. Johnstone, T., Scherer, K.R.: The Effects of Emotions on Voice Quality. In: Proceedings of the XIVth International Congress of Phonetic Science, San Francisco, pp. 2029–2032 (1999)

    Google Scholar 

  9. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 412–420 (1997)

    Google Scholar 

  10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  11. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, S. (2008). Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87734-9_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87733-2

  • Online ISBN: 978-3-540-87734-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics