Skip to main content

Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database

  • Chapter
  • First Online:
Information Systems Development

Abstract

An emotion recognition framework based on sound processing could improve services in human–computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. P. J. Lang, “The Emotion Probe: Studies of Motivation and Attention”, American Psychologist 50(5), 1995, pp. 372–385.

    Article  Google Scholar 

  2. V. Hozjan and Z. Kacic, “Context-independent multilingual emotion recognition from speech signals”, International Journal of Speech Technology 6, 2003, pp. 311–320.

    Article  Google Scholar 

  3. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss. “A database of german emotional speech”. In Proceedings Interspeech, Lisbon, Portugal, 2005.

    Google Scholar 

  4. S. Kim, P. Georgiou, S. Lee, and S. Narayanan. “Real-time emotion detection system using speech: Multi-modal fusion of different timescale features”, Proceedings of IEEE Multimedia Signal Processing Workshop, Chania, Greece, 2007.

    Google Scholar 

  5. D. Morrison, R. Wang, and L. C. De Silva. “Ensemble methods for spoken emotion recognition in call-centres”, Speech Communication 49, 2007, pp. 98–112.

    Article  Google Scholar 

  6. J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke. “Prosody-based automatic detection of annoyance and frustration in human–computer dialog”. Proceedings of the International Conference on Spoken Language Processing (ICSLP 2002), Denver, Colorado.

    Google Scholar 

  7. V. Petrushin, 2000. Emotion recognition in speech signal: experimental study, development, and application. In: Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.

    Google Scholar 

  8. T. Bänziger and K. R.Scherer, 2005, “The role of intonation in emotional expression”, Speech Communication 46, pp. 252–267.

    Article  Google Scholar 

  9. W. H. Abdulla and N. K. Kasabov, 2001, “Improving speech recognition performance through gender separation”, In Proceedings of ANNES, Dunedin, New Zealand, pp. 218–222.

    Google Scholar 

  10. Y. Wang and L. Guan, “Recognizing human emotion from audiovisual information,” Proceedings ICASP 2005, pp. 1125-1128.

    Google Scholar 

  11. T. Vogt and E. Andre, 2006, “Improving Automatic Emotion Recognition from Speech via Gender Differentiation” In Proc. of Language Resources and Evaluation Conference, 2006, pp. 1123–1126.

    Google Scholar 

  12. T. P. Kostoulas and N. Fakotakis, 2006, “A Speaker Dependent Emotion Recognition Framework”, CSNDSP Fifth International Symposium, Patras, July 19–21, pp. 305–309.

    Google Scholar 

  13. M. Fingerhut, 2004, “Music Information Retrieval, or how to search for (and maybe find) music and do away with incipits”, IAML-IASA Congress, Oslo (Norway), August 8–13.

    Google Scholar 

  14. K. R. Scherer, 2003, “Vocal communication of emotion: a review of research paradigms. Speech Communication 40, pp. 227–256.

    Article  MATH  Google Scholar 

  15. Waikato Environment for Knowledge Analysis (WEKA), [Computer program]. Retrieved January 24, 2006, from http://www.cs.waikato.ac.nz/ml/weka/

  16. D.Talkin, “A Robust Algorithm for Pitch Tracking (RAPT)”, Speech Coding & Synthesis, 1995.

    Google Scholar 

  17. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

  18. http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/

  19. B. Schuller, S. Reiter, R. Müller, M. Al-Hames, M. Lang, and G. Rigoll, 2005, “Speaker independent speech emotion recognition by ensemble classification”, Proc. of IEEE International Conference on Multimedia and Expo, pp. 864–867.

    Google Scholar 

  20. E. H. Kim, K. H. Hyun, and Y. K. Kwak, 2005, “Robust emotion recognition feature, frequency range of meaningful signal”, Proc. of 2005 IEEE International Workshop on Robots and Human Interactive Communication, pp. 667–671.

    Google Scholar 

  21. F. Yu, E. Chang, Y.-Q. Xu, and H.-Y. Shum, Emotion Detection from Speech to Enrich Multimedia Content, H.-Y. Shum, M. Liao, and S.-F. Chang (Eds.): PCM 2001, Lecture Notes in Computer Science, vol. 2195, 2001, pp. 550–557.

    Google Scholar 

  22. T. Kostoulas, T. Ganchev, and N. Fakotakis, “Study on speaker-independent emotion recognition from speech on real-world data”, Cost2102 Workshop, Lecture Notes in Computer Science, 2007, (in press).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Anagnostopoulos, C.N., Vovoli, E. (2009). Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database. In: Papadopoulos, G., Wojtkowski, W., Wojtkowski, G., Wrycza, S., Zupancic, J. (eds) Information Systems Development. Springer, Boston, MA. https://doi.org/10.1007/b137171_43

Download citation

  • DOI: https://doi.org/10.1007/b137171_43

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-84809-9

  • Online ISBN: 978-0-387-84810-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics