Multi-stage classification of emotional speech motivated by a dimensional emotion model

Xiao, Zhongzhe; Dellandrea, Emmanuel; Dou, Weibei; Chen, Liming

doi:10.1007/s11042-009-0319-3

Multi-stage classification of emotional speech motivated by a dimensional emotion model

Published: 09 July 2009

Volume 46, pages 119–145, (2010)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhongzhe Xiao¹,
Emmanuel Dellandrea¹,
Weibei Dou² &
…
Liming Chen¹

496 Accesses
22 Citations
Explore all metrics

Abstract

This paper deals with speech emotion analysis within the context of increasing awareness of the wide application potential of affective computing. Unlike most works in the literature which mainly rely on classical frequency and energy based features along with a single global classifier for emotion recognition, we propose in this paper some new harmonic and Zipf based features for better speech emotion characterization in the valence dimension and a multi-stage classification scheme driven by a dimensional emotion model for better emotional class discrimination. Experimented on the Berlin dataset with 68 features and six emotion states, our approach shows its effectiveness, displaying a 68.60% classification rate and reaching a 71.52% classification rate when a gender classification is first applied. Using the DES dataset with five emotion states, our approach achieves an 81% recognition rate when the best performance in the literature to our knowledge is 76.15% on the same dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

A Survey of Human Emotion Recognition Using Speech Signals: Current Trends and Future Perspectives

References

Abelin A, Allwood J (2000) Cross-linguistic interpretation of emotional prosody. Proceedings of the ISCA Workshop on Speech and Emotion, Belfast
Atal B, Rabiner L (1976) A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on ASSP 24(3):201–212
Article Google Scholar
Banse R, Sherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636. doi:10.1037/0022-3514.70.3.614
Article Google Scholar
Bellman R (1961) Adaptive control processes: a guided tour, Princeton University Press
Bishop CM Pattern recognition and machine learning, Ed. Springer, 2006
Breazeal C (2001) Designing social robots. MIT Press, Cambridge, MA
Google Scholar
Brian CJ Moore (1997) An introduction to the psychology of hearing, Academic Press
Burkhardt F, Sendlmeier W (2000) Verification of acoustical correlates of emotional speech using formant-synthesis, Proceedings of the ISCA Workshop on Speech and Emotion
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss BA (2005) Database of German Emotional Speech Proceedings Interspeech, Lisbon, Portugal
Childers DG, Hand M, Larar JM (1989) Silent and voiced/unvoied/ mixed excitation(four-way), classification of speech. IEEE Transaction on ASSP 37(11):1771–1774
Article Google Scholar
Cohen A, Mantegna RN, Havlin S (1997) Numerical analysis of word frequencies in artificial and natural language texts. Fractals 5(1):95–104. doi:10.1142/S0218348X97000103
Article MATH Google Scholar
Dellandrea E, Makris P, Vincent N (2004) Zipf analysis of audio signals, fractals. World Sci Publishing Co 12(1):73–85
Google Scholar
Devillers L, Lamel L (2003) Emotion detection in task-oriented dialogs, proceedings of the ICME 2003, IEEE, Multimedia Human-Machine Interface and Interaction I, Vol.III, pp.549-552, Baltimore, MD, USA
Druin A, Hendler J (2000) Robots for fids: exploring new technologies for learning. Morgan Kauffman, Los Altos, CA
Google Scholar
Ekman P Emotions in the human face, Cambridge University Press, 1982
Engberg IS, Hansen AV (1996) Documentation of the Danish Emotional Speech Database DES, Aalborg
Harb H, Chen L (2005) Voice-based gender identification in multimedia applications. J Intell Inf Syst 24(2):179–198
Article Google Scholar
Havlin S (1995) The distance between Zipf Plots. Physica A 216:148–150. doi:10.1016/0378-4371(95)00069-J
Article MathSciNet Google Scholar
http://emotion-research.net
Juslin PN (2000) Cue utilization in communication of emotion in music performance: relating performance to perception. J Exp Psychol 16(6):1797–1813
Google Scholar
Kusahara M (2001) The art of creating subjective reality: an analysis of Japanese digital pets. In: Boudreau E (ed) in artificial life 7 workshop proceedings, p141–144
McGilloway S, Cowie R, Cowie ED, Gielen S, Westerdijk M, Stroeve S (2000) Approaching automatic recognition of emotion from voice: a rough benchmark, Proceedings of the ISCA workshop on Speech and Emotion, p. 207–212, Newcastle, Northern Ireland
Morrison D, Silva LCD (2007) Voting ensembles for spoken affect classification. J Netw Comput Appl 30:1356–1365. doi:10.1016/j.jnca.2006.09.005
Article Google Scholar
Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Stud 59(1–2):157–183. doi:10.1016/S1071-5819(02)00141-6
Google Scholar
Pereira C (2000) Dimensions of emotional meaning in speech, Proceedings of the ISCA workshop on speech and emotion p. 25–28, Newcastle, Northern Ireland
Picard R (1997) Affective computing. MIT Press
Polzin T, Waibel A (2000) Emotion-sensitive human-computer interfaces, Proceedings of the ISCA workshop on Speech and Emotion, p. 201 ~ 206, Newcastle, Northern Ireland
PRAAT (2001) A system for doing phonetics by computer. Glot Int 5(9/10):341–345
Google Scholar
Rakotomalala R (2005) TANAGRA : un logiciel gratuit pour l'enseignement et la recherche, in Actes de EGC'2005, RNTI-E-3, vol. 2, pp. 697-702
Russel JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178. doi:10.1037/h0077714
Article Google Scholar
Scherer KR (1989) Vocal correlates of emotion. In: Manstead A, Wagner H (eds) Handbook of psychophysiology: emotion and social behavior. Wiley, London, pp 165–197
Google Scholar
Scherer KR (2002) Vocal communication of emotion: a review of research paradigms. Speech Commun 40:227–256. doi:10.1016/S0167-6393(02)00084-5
Article Google Scholar
Scherer KR, Kappas A (1988) Primate vocal expression of affective state. In: Todt D, Goedeking P, Symmes D (eds) Primate vocal communication. Springer, Berlin, pp 171–194
Google Scholar
Scherer KR, Johnstone T, Klasmeyer G, Banziger T (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech? Proc.ICSLP2000, Beijing, China
Scherer KR, Schorr A, Johnstone T (2001) Appraisal processes in emotion: theory, methods, research, Oxford University Press, New York and Oxford
Schuller B, Rigoll G, Lang M (2003) Hidden markov model-based speech emotion recognition. Proceedings of ICASSP 2003, pp.II-1-II-4
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in hybrid support vector machine − belief network architecture, proceedings of ICASSP, pp I-577-I-580
Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification, ICME, pp. 864–867
Schuller B, Reiter S, Rigoll G (2006) Evolutionary feature generation in speech emotion recognition. ICME 2006:5–8
Google Scholar
Schuller B, Wimmer M, Mösenlechner L, Kern C, Arsic D, Rigoll G (2008) Brute-forcing hierarchical functional for paralinguistics : a waste of feature space. Proceedings of Icassp, pp 4501–4504
Slaney M, Mcroberts G (1998) Baby Ears: A recognition system for affective vocalizations. Proceedings of the ICASSP 1998, Seattle, WA
Spence C, Sajda P (1998) The role of feature selection in building pattern recognizers for computer-aided diagnosis, Proceedings of SPIE - Volume 3338, Medical Imaging 1998: Image Processing, Kenneth M. Hanson, Editor, p 1434–1441
Thayer RE (1989) The biopsychology of mood and arousal. Oxford Univ. Press
Tickle A (2000) English and Japanese speaker’s emotion vocalizations and recognition: a comparison highlighting vowel quality, ISCA Workshop on Speech and Emotion, Belfast
Ververidis D, Kotropoulos C (2004) Automatic speech classification to five emotional states based on gender information, Proceedings of 12th European Signal Processing Conference, p 341–344, Austria
Ververidis D Kotropoulos C (2005) Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm, IEEE International Conference on Multimedia and Expo, ICME, p. 1500– 1503
Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. Proceedings of ICASSP 2004, pp 593–596, Montreal, Canada
Voght T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition, in Proc. Multimedia and Expo (ICME 2005), Amsterdam, pp.474–477
Watson D, Tellegen A (1985) Toward a Consensual Structure of Mood. Psychol Bull 98:219–235. doi:10.1037/0033-2909.98.2.219
Article Google Scholar
Wieczorkowska A, Synak P, Lewis R, Ras ZW (2005) Extracting emotions from music data. Proceedings of 15th International Symposium, ISMIS 2005, p. 456–465, Saratoga Springs, NY, USA
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, CA, USA
Google Scholar
Xiao Z, Dellandrea E, Dou W, Chen L (2005) Features extraction and selection in emotional speech, International Conference on Advanced Video and Signal based Surveillance (AVSS 2005). p. 411–416., Como, Italy
Xiao Z, Dellandrea E, Dou W, Chen L (2006) Two-stage classification of emotional speech, International Conference on Digital Telecommunications (ICDT'06), p. 32–37, Cap Esterel, Côte d’Azur, France
Xiao Z, Dellandrea E, Dou W, Chen L (2007) Automatic hierarchical classification of emotional speech, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007), p. 291–296, Taiwan
Xiao Z, Dellandrea E, Dou W, Chen L (2007) Hierarchical classification of emotional speech, research report RR-LIRIS-2007-006, LIRIS UMR 5205 CNRS
Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley Press, 1949

Download references

Acknowledgment

This work has received a scholarship awarded by the French government from 2004 to 2007 and was partly supported by a PRA project Apollo under the number SI04-02 and a PICS grant by CNRS under the number 3597.

Author information

Authors and Affiliations

LIRIS Laboratory, UMR5205, CNRS, Université de Lyon, Ecole Centrale de Lyon, 36 av Guy de Collongue, 69134, Ecully Cedex, France
Zhongzhe Xiao, Emmanuel Dellandrea & Liming Chen
Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing, 100084, People’s Republic of China
Weibei Dou

Authors

Zhongzhe Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Dellandrea
View author publications
You can also search for this author in PubMed Google Scholar
Weibei Dou
View author publications
You can also search for this author in PubMed Google Scholar
Liming Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emmanuel Dellandrea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Z., Dellandrea, E., Dou, W. et al. Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46, 119–145 (2010). https://doi.org/10.1007/s11042-009-0319-3

Download citation

Published: 09 July 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s11042-009-0319-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stage classification of emotional speech motivated by a dimensional emotion model

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Speech Emotion Recognition: A Comprehensive Survey

A Survey of Human Emotion Recognition Using Speech Signals: Current Trends and Future Perspectives

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-stage classification of emotional speech motivated by a dimensional emotion model

Abstract

Access this article

Similar content being viewed by others

A Hierarchical Classification Scheme for Efficient Speech Emotion Recognition

Speech Emotion Recognition: A Comprehensive Survey

A Survey of Human Emotion Recognition Using Speech Signals: Current Trends and Future Perspectives

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation