Skip to main content
Log in

Multi-stage classification of emotional speech motivated by a dimensional emotion model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multi-stage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abelin A, Allwood J (2000) Cross-linguistic interpretation of emotional prosody. Proceedings of the ISCA Workshop on Speech and Emotion, Belfast

  2. Atal B, Rabiner L (1976) A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on ASSP 24(3):201–212

    Article  Google Scholar 

  3. Banse R, Sherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636. doi:10.1037/0022-3514.70.3.614

    Article  Google Scholar 

  4. Bellman R (1961) Adaptive control processes: a guided tour, Princeton University Press

  5. Bishop CM Pattern recognition and machine learning, Ed. Springer, 2006

  6. Breazeal C (2001) Designing social robots. MIT Press, Cambridge, MA

    Google Scholar 

  7. Brian CJ Moore (1997) An introduction to the psychology of hearing, Academic Press

  8. Burkhardt F, Sendlmeier W (2000) Verification of acoustical correlates of emotional speech using formant-synthesis, Proceedings of the ISCA Workshop on Speech and Emotion

  9. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss BA (2005) Database of German Emotional Speech Proceedings Interspeech, Lisbon, Portugal

  10. Childers DG, Hand M, Larar JM (1989) Silent and voiced/unvoied/ mixed excitation(four-way), classification of speech. IEEE Transaction on ASSP 37(11):1771–1774

    Article  Google Scholar 

  11. Cohen A, Mantegna RN, Havlin S (1997) Numerical analysis of word frequencies in artificial and natural language texts. Fractals 5(1):95–104. doi:10.1142/S0218348X97000103

    Article  MATH  Google Scholar 

  12. Dellandrea E, Makris P, Vincent N (2004) Zipf analysis of audio signals, fractals. World Sci Publishing Co 12(1):73–85

    Google Scholar 

  13. Devillers L, Lamel L (2003) Emotion detection in task-oriented dialogs, proceedings of the ICME 2003, IEEE, Multimedia Human-Machine Interface and Interaction I, Vol.III, pp.549-552, Baltimore, MD, USA

  14. Druin A, Hendler J (2000) Robots for fids: exploring new technologies for learning. Morgan Kauffman, Los Altos, CA

    Google Scholar 

  15. Ekman P Emotions in the human face, Cambridge University Press, 1982

  16. Engberg IS, Hansen AV (1996) Documentation of the Danish Emotional Speech Database DES, Aalborg

  17. Harb H, Chen L (2005) Voice-based gender identification in multimedia applications. J Intell Inf Syst 24(2):179–198

    Article  Google Scholar 

  18. Havlin S (1995) The distance between Zipf Plots. Physica A 216:148–150. doi:10.1016/0378-4371(95)00069-J

    Article  MathSciNet  Google Scholar 

  19. http://emotion-research.net

  20. Juslin PN (2000) Cue utilization in communication of emotion in music performance: relating performance to perception. J Exp Psychol 16(6):1797–1813

    Google Scholar 

  21. Kusahara M (2001) The art of creating subjective reality: an analysis of Japanese digital pets. In: Boudreau E (ed) in artificial life 7 workshop proceedings, p141–144

  22. McGilloway S, Cowie R, Cowie ED, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark, Proceedings of the ISCA workshop on Speech and Emotion, p. 207–212, Newcastle, Northern Ireland

  23. Morrison D, Silva LCD (2007) Voting ensembles for spoken affect classification. J Netw Comput Appl 30:1356–1365. doi:10.1016/j.jnca.2006.09.005

    Article  Google Scholar 

  24. Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Stud 59(1–2):157–183. doi:10.1016/S1071-5819(02)00141-6

    Google Scholar 

  25. Pereira C (2000) Dimensions of emotional meaning in speech, Proceedings of the ISCA workshop on speech and emotion p. 25–28, Newcastle, Northern Ireland

  26. Picard R (1997) Affective computing. MIT Press

  27. Polzin T, Waibel A (2000) Emotion-sensitive human-computer interfaces, Proceedings of the ISCA workshop on Speech and Emotion, p. 201 ~ 206, Newcastle, Northern Ireland

  28. PRAAT (2001) A system for doing phonetics by computer. Glot Int 5(9/10):341–345

    Google Scholar 

  29. Rakotomalala R (2005) TANAGRA : un logiciel gratuit pour l'enseignement et la recherche, in Actes de EGC'2005, RNTI-E-3, vol. 2, pp. 697-702

  30. Russel JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178. doi:10.1037/h0077714

    Article  Google Scholar 

  31. Scherer KR (1989) Vocal correlates of emotion. In: Manstead A, Wagner H (eds) Handbook of psychophysiology: emotion and social behavior. Wiley, London, pp 165–197

    Google Scholar 

  32. Scherer KR (2002) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256. doi:10.1016/S0167-6393(02)00084-5

    Article  Google Scholar 

  33. Scherer KR, Kappas A (1988) Primate vocal expression of affective state. In: Todt D, Goedeking P, Symmes D (eds) Primate vocal communication. Springer, Berlin, pp 171–194

    Google Scholar 

  34. Scherer KR, Johnstone T, Klasmeyer G, Banziger T (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech? Proc.ICSLP2000, Beijing, China

  35. Scherer KR, Schorr A, Johnstone T (2001) Appraisal processes in emotion: theory, methods, research, Oxford University Press, New York and Oxford

  36. Schuller B, Rigoll G, Lang M (2003) Hidden markov model-based speech emotion recognition. Proceedings of ICASSP 2003, pp.II-1-II-4

  37. Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in hybrid support vector machine − belief network architecture, proceedings of ICASSP, pp I-577-I-580

  38. Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification, ICME, pp. 864–867

  39. Schuller B, Reiter S, Rigoll G (2006) Evolutionary feature generation in speech emotion recognition. ICME 2006:5–8

    Google Scholar 

  40. Schuller B, Wimmer M, Mösenlechner L, Kern C, Arsic D, Rigoll G (2008) Brute-forcing hierarchical functional for paralinguistics : a waste of feature space. Proceedings of Icassp, pp 4501–4504

  41. Slaney M, Mcroberts G (1998) Baby Ears: A recognition system for affective vocalizations. Proceedings of the ICASSP 1998, Seattle, WA

  42. Spence C, Sajda P (1998) The role of feature selection in building pattern recognizers for computer-aided diagnosis, Proceedings of SPIE - Volume 3338, Medical Imaging 1998: Image Processing, Kenneth M. Hanson, Editor, p 1434–1441

  43. Thayer RE (1989) The biopsychology of mood and arousal. Oxford Univ. Press

  44. Tickle A (2000) English and Japanese speaker’s emotion vocalizations and recognition: a comparison highlighting vowel quality, ISCA Workshop on Speech and Emotion, Belfast

  45. Ververidis D, Kotropoulos C (2004) Automatic speech classification to five emotional states based on gender information, Proceedings of 12th European Signal Processing Conference, p 341–344, Austria

  46. Ververidis D Kotropoulos C (2005) Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm, IEEE International Conference on Multimedia and Expo, ICME, p. 1500– 1503

  47. Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. Proceedings of ICASSP 2004, pp 593–596, Montreal, Canada

  48. Voght T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition, in Proc. Multimedia and Expo (ICME 2005), Amsterdam, pp.474–477

  49. Watson D, Tellegen A (1985) Toward a Consensual Structure of Mood. Psychol Bull 98:219–235. doi:10.1037/0033-2909.98.2.219

    Article  Google Scholar 

  50. Wieczorkowska A, Synak P, Lewis R, Ras ZW (2005) Extracting emotions from music data. Proceedings of 15th International Symposium, ISMIS 2005, p. 456–465, Saratoga Springs, NY, USA

  51. Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA, USA

    Google Scholar 

  52. Xiao Z, Dellandrea E, Dou W, Chen L (2005) Features extraction and selection in emotional speech, International Conference on Advanced Video and Signal based Surveillance (AVSS 2005). p. 411–416., Como, Italy

  53. Xiao Z, Dellandrea E, Dou W, Chen L (2006) Two-stage classification of emotional speech, International Conference on Digital Telecommunications (ICDT'06), p. 32–37, Cap Esterel, Côte d’Azur, France

  54. Xiao Z, Dellandrea E, Dou W, Chen L (2007) Automatic hierarchical classification of emotional speech, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007), p. 291–296, Taiwan

  55. Xiao Z, Dellandrea E, Dou W, Chen L (2007) Hierarchical classification of emotional speech, research report RR-LIRIS-2007-006, LIRIS UMR 5205 CNRS

  56. Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley Press, 1949

Download references

Acknowledgment

This work has received a scholarship awarded by the French government from 2004 to 2007 and was partly supported by a PRA project Apollo under the number SI04-02 and a PICS grant by CNRS under the number 3597.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emmanuel Dellandrea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Z., Dellandrea, E., Dou, W. et al. Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46, 119–145 (2010). https://doi.org/10.1007/s11042-009-0319-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0319-3

Keywords

Navigation