Skip to main content
Log in

Mandarin emotion recognition combining acoustic and emotional point information

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this contribution, we introduce a novel approach to combine acoustic information and emotional point information for a robust automatic recognition of a speaker’s emotion. Six discrete emotional states are recognized in the work. Firstly, a multi-level model for emotion recognition by acoustic features is presented. The derived features are selected by fisher rate to distinguish different types of emotions. Secondly, a novel emotional point model for Mandarin is established by Support Vector Machine and Hidden Markov Model. This model contains 28 emotional syllables which reflect rich emotional information. Finally the acoustic information and emotional point information are integrated by a soft decision strategy. Experimental results show that the application of emotional point information in speech emotion recognition is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Picard R (2000) Affective computing. MIT Press, Cambridge

    Google Scholar 

  2. Malatesta L, Raouzaiou A, Karpouzis K, Kollias S (2009) Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis. Appl Intell 30(1):58–64

    Article  Google Scholar 

  3. Cho S (2002) Towards creative evolutionary systems with interactive genetic algorithm. Appl Intell 16(2):129–138

    Article  MATH  Google Scholar 

  4. Tao J, Tan T (2005) Affective computing: a review. In: Proc of affective computing and intelligent interaction, pp 981–995

    Chapter  Google Scholar 

  5. Assaleh K, Shanableh T (2010) Robust polynomial classifier using L 1-norm minimization. Appl Intell 33(3):330–339

    Article  Google Scholar 

  6. Ince G, Nakadai K, Rodemann T, Tsujino H, Imura J (2011) Ego noise cancellation of a robot using missing feature masks. Appl Intell 34(3):1–12

    Article  Google Scholar 

  7. Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181

    Article  Google Scholar 

  8. Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256

    Article  MATH  Google Scholar 

  9. Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth international conference on spoken language processing, Beijing, China, vol 2, pp 222–225

    Google Scholar 

  10. Yoon W, Park K (2007) A study of emotion recognition and its applications. In: Proc of modeling decisions for artificial intelligence, pp 455–462

    Chapter  Google Scholar 

  11. Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27(12):1760–1774

    Article  Google Scholar 

  12. Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200

    Article  Google Scholar 

  13. Plutchik R (1980) Emotion: a psychoevolutionary synthesis. Harper Collins, New York

    Google Scholar 

  14. Mehrabian A, Russell J (1974) An approach to environmental psychology. MIT Press, Cambridge

    Google Scholar 

  15. Coghlan A, Pearce P (2010) Tracking affective components of satisfaction. Tour Hosp Res 10(1):42

    Article  Google Scholar 

  16. Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614

    Article  Google Scholar 

  17. Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423

    Article  MATH  Google Scholar 

  18. Fujisaki H (2004) Information, prosody, and modeling-with emphasis on tonal features of speech. In: Proceedings of speech prosody 2004, Nara, Japan, pp 1–10

    Google Scholar 

  19. Zhao L, Cao Y, Wang Z, Zou C (2005) Speech emotional recognition using global and time sequence structure features with MMD. In: Proc of affective computing and intelligent interaction, pp 311–318

    Chapter  Google Scholar 

  20. Shami M, Kamel M (2005) Segment-based approach to the recognition of emotions in speech. In: 2005 IEEE international conference on multimedia and expo. IEEE Press, New York, pp 1–4

    Google Scholar 

  21. Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings ICASSP’04, vol 1. IEEE Press, New York, pp I577–I580

    Google Scholar 

  22. Zhang J, Hirose K (2004) Tone nucleus modeling for Chinese lexical tone recognition. Speech Commun 42(3):447–466

    Article  Google Scholar 

  23. Chao Y (1965) A grammar of spoken Chinese. University of California Press, Berkeley

    Google Scholar 

  24. Chen Y, Wang R (1990) Speech signal processing. University of Science and Technology of China Press, Hefei (in Chinese)

    Google Scholar 

  25. Olson C (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325

    Article  MathSciNet  MATH  Google Scholar 

  26. Mao X, Chen L (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst 93(8):2324–2326

    Article  Google Scholar 

  27. Xiao Z, Dellandrea E, Dou W, Chen L (2010) Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46(1):119–145

    Article  Google Scholar 

  28. Fisher R (1938) The statistical utilization of multiple measurements. Ann Hum Genet 8(4):376–386

    Google Scholar 

  29. Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  30. Sun Y, Zhou Y, Zhao Q, Yan Y (2010) Acoustic feature optimization based on f-ratio for robust speech recognition. IEICE Trans Inf Syst 93(9):2417–2430

    Article  Google Scholar 

  31. Pudil P, Novovicová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125

    Article  Google Scholar 

  32. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  33. Lin Y, Wei G (2005) Speech emotion recognition based on HMM and SVM. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 8. IEEE Press, New York, pp 4898–4901

    Chapter  Google Scholar 

  34. Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: IEEE international conference on multimedia and expo, 2005. ICME 2005. IEEE Press, New York, pp 864–867

    Chapter  Google Scholar 

  35. Damper R, Gunn S, Gore M (2000) Extracting phonetic knowledge from learning systems: perceptrons, support vector machines and linear discriminants. Appl Intell 12(1):43–62

    Article  Google Scholar 

  36. Hooper J (1972) The syllable in phonological theory. Language 48:525–540

    Article  Google Scholar 

  37. Goslin J, Frauenfelder U (2001) A comparison of theoretical and human syllabification. Lang Speech 44(4):409–436

    Article  Google Scholar 

  38. Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1–3):133–147

    Article  Google Scholar 

  39. Viikki O, Bye D, Laurila K (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 2. IEEE Press, New York, pp 733–736

    Google Scholar 

  40. Glass J, Chang J, McCandless M (1996) A probabilistic framework for feature-based speech recognition. In: Proceedings of fourth international conference on spoken language, ICSLP 96, vol 4. IEEE Press, New York, pp 2277–2280

    Chapter  Google Scholar 

  41. Nogueiras A, Moreno A, Bonafonte A, Mariño J (2001) Speech emotion recognition using hidden Markov models. In: Proceedings of eurospeech, 2001, pp 2679–2682

    Google Scholar 

  42. Fernandez R, Picard R (2003) Modeling drivers’ speech under stress. Speech Commun 40(1–2):145–159

    Article  MATH  Google Scholar 

  43. Janev M, Pekar D, Jakovljevic N, Delic V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMS with full covariance matrices. Appl Intell 33(2):107–116

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the International Science and Technology Cooperation Program of China (No. 2010DFA11990) and the National Nature Science Foundation of China (No. 61103097).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Mao, X., Wei, P. et al. Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37, 602–612 (2012). https://doi.org/10.1007/s10489-012-0352-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-012-0352-1

Keywords

Navigation