Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech

Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng; Liao, Wen-Yuan

doi:10.1007/11573548_36

Tsang-Long Pao¹⁹,
Yu-Te Chen¹⁹,
Jun-Heng Yeh¹⁹ &
…
Wen-Yuan Liao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Included in the following conference series:

International Conference on Affective Computing and Intelligent Interaction

5103 Accesses
25 Citations
1 Altmetric

Abstract

Combining different feature streams to obtain a more accurate experimental result is a well-known technique. The basic argument is that if the recognition errors of systems using the individual streams occur at different points, there is at least a chance that a combined system will be able to correct some of these errors by reference to the other streams. In the emotional speech recognition system, there are many ways in which this general principle can be applied. In this paper, we proposed using feature selection and feature combination to improve the speaker-dependent emotion recognition in Mandarin speech. Five basic emotions are investigated including anger, boredom, happiness, neutral and sadness. Combining multiple feature streams is clearly highly beneficial in our system. The best accuracy recognizing five different emotions can be achieved 99.44% by using MFCC, LPCC, RastaPLP, LFPC feature streams and the nearest class mean classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Lee, C.M., Narayanan, S.: Towards detecting emotion in spoken dialogs. IEEE Trans. on Speech & Audio Processing (in press)
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, A., Taylor, J.: Emotion Recognition in Human-Computer Interactions. IEEE Sig. Proc. Mag. 18, 32–80 (2001)
Article Google Scholar
Litman, D., ad Forbes, K.: Recognizing Emotions from Student Speech in Tutoring Dialogues. In: Proceedings of the ASRU 2003 (2003)
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 614–636 (1996)
Google Scholar
Le, X.H., Quenot, G., Castelli, E.: Recognizing emotions for the audio-visual document indexing. In: Proceedings of Computers and Communications, ISCC, 2004, pp. 580–584 (2004)
Google Scholar
Nwe, T.L., Wei, F.S., De Silva, L.C.: Speech Emotion Recognition using Hidden Markov models. Speech Communication (2003)
Google Scholar
Hermansky, H., Morgan, N.: RASTA Processing of Speech. IEEE Transactions on Speech and Audio Processing 2(4) (October 1994)
Google Scholar
Ellis, D.P.W.: Stream combination before and/or after the acoustic model. In: Proc. of the Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2000 (2000a)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Tatung University,
Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh & Wen-Yuan Liao

Authors

Tsang-Long Pao
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Te Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Heng Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yuan Liao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pao, TL., Chen, YT., Yeh, JH., Liao, WY. (2005). Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_36

Download citation

DOI: https://doi.org/10.1007/11573548_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics