Mandarin emotion recognition combining acoustic and emotional point information

Chen, Lijiang; Mao, Xia; Wei, Pengfei; Xue, Yuli; Ishizuka, Mitsuru

doi:10.1007/s10489-012-0352-1

Mandarin emotion recognition combining acoustic and emotional point information

Published: 17 May 2012

Volume 37, pages 602–612, (2012)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Lijiang Chen¹,
Xia Mao¹,
Pengfei Wei¹,
Yuli Xue¹ &
…
Mitsuru Ishizuka²

517 Accesses
16 Citations
Explore all metrics

Abstract

In this contribution, we introduce a novel approach to combine acoustic information and emotional point information for a robust automatic recognition of a speaker’s emotion. Six discrete emotional states are recognized in the work. Firstly, a multi-level model for emotion recognition by acoustic features is presented. The derived features are selected by fisher rate to distinguish different types of emotions. Secondly, a novel emotional point model for Mandarin is established by Support Vector Machine and Hidden Markov Model. This model contains 28 emotional syllables which reflect rich emotional information. Finally the acoustic information and emotional point information are integrated by a soft decision strategy. Experimental results show that the application of emotional point information in speech emotion recognition is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

Article Open access 13 February 2024

References

Picard R (2000) Affective computing. MIT Press, Cambridge
Google Scholar
Malatesta L, Raouzaiou A, Karpouzis K, Kollias S (2009) Towards modeling embodied conversational agent character profiles using appraisal theory predictions in expression synthesis. Appl Intell 30(1):58–64
Article Google Scholar
Cho S (2002) Towards creative evolutionary systems with interactive genetic algorithm. Appl Intell 16(2):129–138
Article MATH Google Scholar
Tao J, Tan T (2005) Affective computing: a review. In: Proc of affective computing and intelligent interaction, pp 981–995
Chapter Google Scholar
Assaleh K, Shanableh T (2010) Robust polynomial classifier using L ¹-norm minimization. Appl Intell 33(3):330–339
Article Google Scholar
Ince G, Nakadai K, Rodemann T, Tsujino H, Imura J (2011) Ego noise cancellation of a robot using missing feature masks. Appl Intell 34(3):1–12
Article Google Scholar
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48(9):1162–1181
Article Google Scholar
Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256
Article MATH Google Scholar
Petrushin VA (2000) Emotion recognition in speech signal: experimental study, development, and application. In: Sixth international conference on spoken language processing, Beijing, China, vol 2, pp 222–225
Google Scholar
Yoon W, Park K (2007) A study of emotion recognition and its applications. In: Proc of modeling decisions for artificial intelligence, pp 455–462
Chapter Google Scholar
Schuller B, Müller R, Eyben F, Gast J, Hörnler B, Wöllmer M, Rigoll G, Höthker A, Konosu H (2009) Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image Vis Comput 27(12):1760–1774
Article Google Scholar
Ekman P (1992) An argument for basic emotions. Cogn Emot 6(3–4):169–200
Article Google Scholar
Plutchik R (1980) Emotion: a psychoevolutionary synthesis. Harper Collins, New York
Google Scholar
Mehrabian A, Russell J (1974) An approach to environmental psychology. MIT Press, Cambridge
Google Scholar
Coghlan A, Pearce P (2010) Tracking affective components of satisfaction. Tour Hosp Res 10(1):42
Article Google Scholar
Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614
Article Google Scholar
Yang B, Lugger M (2010) Emotion recognition from speech signals using new harmony features. Signal Process 90(5):1415–1423
Article MATH Google Scholar
Fujisaki H (2004) Information, prosody, and modeling-with emphasis on tonal features of speech. In: Proceedings of speech prosody 2004, Nara, Japan, pp 1–10
Google Scholar
Zhao L, Cao Y, Wang Z, Zou C (2005) Speech emotional recognition using global and time sequence structure features with MMD. In: Proc of affective computing and intelligent interaction, pp 311–318
Chapter Google Scholar
Shami M, Kamel M (2005) Segment-based approach to the recognition of emotions in speech. In: 2005 IEEE international conference on multimedia and expo. IEEE Press, New York, pp 1–4
Google Scholar
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE international conference on acoustics, speech, and signal processing. Proceedings ICASSP’04, vol 1. IEEE Press, New York, pp I577–I580
Google Scholar
Zhang J, Hirose K (2004) Tone nucleus modeling for Chinese lexical tone recognition. Speech Commun 42(3):447–466
Article Google Scholar
Chao Y (1965) A grammar of spoken Chinese. University of California Press, Berkeley
Google Scholar
Chen Y, Wang R (1990) Speech signal processing. University of Science and Technology of China Press, Hefei (in Chinese)
Google Scholar
Olson C (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21(8):1313–1325
Article MathSciNet MATH Google Scholar
Mao X, Chen L (2010) Speech emotion recognition based on parametric filter and fractal dimension. IEICE Trans Inf Syst 93(8):2324–2326
Article Google Scholar
Xiao Z, Dellandrea E, Dou W, Chen L (2010) Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tools Appl 46(1):119–145
Article Google Scholar
Fisher R (1938) The statistical utilization of multiple measurements. Ann Hum Genet 8(4):376–386
Google Scholar
Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Sun Y, Zhou Y, Zhao Q, Yan Y (2010) Acoustic feature optimization based on f-ratio for robust speech recognition. IEICE Trans Inf Syst 93(9):2417–2430
Article Google Scholar
Pudil P, Novovicová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Lin Y, Wei G (2005) Speech emotion recognition based on HMM and SVM. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 8. IEEE Press, New York, pp 4898–4901
Chapter Google Scholar
Schuller B, Reiter S, Muller R, Al-Hames M, Lang M, Rigoll G (2005) Speaker independent speech emotion recognition by ensemble classification. In: IEEE international conference on multimedia and expo, 2005. ICME 2005. IEEE Press, New York, pp 864–867
Chapter Google Scholar
Damper R, Gunn S, Gore M (2000) Extracting phonetic knowledge from learning systems: perceptrons, support vector machines and linear discriminants. Appl Intell 12(1):43–62
Article Google Scholar
Hooper J (1972) The syllable in phonological theory. Language 48:525–540
Article Google Scholar
Goslin J, Frauenfelder U (2001) A comparison of theoretical and human syllabification. Lang Speech 44(4):409–436
Article Google Scholar
Viikki O, Laurila K (1998) Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun 25(1–3):133–147
Article Google Scholar
Viikki O, Bye D, Laurila K (1998) A recursive feature vector normalization approach for robust speech recognition in noise. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998, vol 2. IEEE Press, New York, pp 733–736
Google Scholar
Glass J, Chang J, McCandless M (1996) A probabilistic framework for feature-based speech recognition. In: Proceedings of fourth international conference on spoken language, ICSLP 96, vol 4. IEEE Press, New York, pp 2277–2280
Chapter Google Scholar
Nogueiras A, Moreno A, Bonafonte A, Mariño J (2001) Speech emotion recognition using hidden Markov models. In: Proceedings of eurospeech, 2001, pp 2679–2682
Google Scholar
Fernandez R, Picard R (2003) Modeling drivers’ speech under stress. Speech Commun 40(1–2):145–159
Article MATH Google Scholar
Janev M, Pekar D, Jakovljevic N, Delic V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMS with full covariance matrices. Appl Intell 33(2):107–116
Article Google Scholar

Download references

Acknowledgements

This research is supported by the International Science and Technology Cooperation Program of China (No. 2010DFA11990) and the National Nature Science Foundation of China (No. 61103097).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Beihang University, Beijing, China
Lijiang Chen, Xia Mao, Pengfei Wei & Yuli Xue
Department of Information and Communication Engineering, University of Tokyo, Tokyo, Japan
Mitsuru Ishizuka

Authors

Lijiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xia Mao
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yuli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuru Ishizuka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Mao, X., Wei, P. et al. Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37, 602–612 (2012). https://doi.org/10.1007/s10489-012-0352-1

Download citation

Published: 17 May 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10489-012-0352-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mandarin emotion recognition combining acoustic and emotional point information

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mandarin emotion recognition combining acoustic and emotional point information

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Role of machine learning and deep learning techniques in EEG-based BCI emotion recognition system: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation