Speech Emotion Classification on a Riemannian Manifold

Ye, Chengxi; Liu, Jia; Chen, Chun; Song, Mingli; Bu, Jiajun

doi:10.1007/978-3-540-89796-5_7

Chengxi Ye⁸,
Jia Liu⁸,
Chun Chen⁸,
Mingli Song⁸ &
…
Jiajun Bu⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5353))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1487 Accesses
12 Citations

Abstract

We present a novel algorithm for speech emotion classification. In contrast to previous methods, we additionally consider the relations between simple features by incorporating covariance matrices as the new feature descriptors. Since non-singular covariance matrices do not lie on a linear space, we endow the space with an affine invariance metric and render it into a Riemannian manifold. After that we use the tangent space to approximate the manifold. Classification is performed in the tangent space and a generalized principal component analysis is presented. We test the algorithm on speech emotion classification and the experiment results show an improvement at around 13%(+3% with PCA) in recognition accuracy. Based on that we are able to train one simple model to accurately differentiate the emotions from both genders.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bezooijen, R.V.: The Characterisitcs and Recognizability of Vocal Expression of Emotions. Foris, Drodrecht (1984)
Book Google Scholar
Cowie, R., Cowie, E.D., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)
Article Google Scholar
Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional Space Improves Emotion Recognition. In: Proc. Int’l Conf. on Spoken Language Processing, pp. 2029–2032 (2002)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-based Speech Emotion Recognition. In: Proc. European Conf. on Speech Communication and Technology, pp. 401–405 (2003)
Google Scholar
Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based Classification of Emotions in Spoken Finnish. In: Proc. European Conf. on Speech Communication and Technology, pp. 717–720 (2003)
Google Scholar
Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proc. IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, pp. 593–596 (2004)
Google Scholar
Chateau, N., Maffiolo, V., Blouin, C.: Analysis of Emotional Speech in Voice Mail Messages: the Influence of Speakers Gender. In: Proc. Int’l Conf. on Spoken Language Processing, pp. 885–888 (2004)
Google Scholar
Chuang, Z.J., Wu, C.H.: Emotion Recognition using Acoustic Features and Textual Content. In: Proc. IEEE Int’l Conf. on Multimedia and Expo., vol. 1, pp. 53–56 (2004)
Google Scholar
Park, C.H., Heo, K.S., Lee, D.W., Joo, Y.H., Sim, K.B.: Emotion Recognition based on Frequency Analysis of Speech Signal. Int’l Journal of Fuzzy Logic and Intelligent Systems, 122–126 (2002)
Google Scholar
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion Recognition by Speech Signals. In: Proc. European Conf. on Speech Communication and Technology, pp. 125–128 (2003)
Google Scholar
Pao, T.L., Chen, Y.T., Yeh, J.H.: Emotion Recognition from Mandarin Speech Signals. In: Proc. Int’l Symposium on Chinese Spoken Language Processing, pp. 301–304 (2004)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Detection of Stress and Emotion in Speech Using Traditional and FFT Based Log Energy Features. In: Proc. Pacific Rim Conference on Multimedia, vol. 3, pp. 1619–1623 (2003)
Google Scholar
Lippman, R.: Speech Recognition by Machines and Humans. Speech Communication 22(1), 1–15 (1997)
Article MathSciNet Google Scholar
Murray, I., Arnott, J.: Toward a Simulation of Emotion in Synthetic Speech: a Review of the Literature on Human Vocal Emotion. J. Acoustic Society of America 93(2), 1097–1108 (1993)
Article Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Region Covariance: a Fast Descriptor for Detection and Classification. In: Proc. European Conf. on Computer Vision, vol. 2, pp. 589–600 (2006)
Google Scholar
Porikli, F., Tuzel, O., Meer, P.: Covariance Tracking using Model Update based on Lie Algebra. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 728–735 (2006)
Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Human Detection via Classification on Riemannian Manifolds. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Carmo, M.P.D.: Differential Geometry of Curves and Surfaces. Prentice-Hall, Inc., Englewood Cliffs (1976)
MATH Google Scholar
Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Intl. J. of Computer Vision 66(1), 41–66 (2006)
Article MATH Google Scholar
Fletcher, P.T., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process 87(2), 250–262 (2007)
Article MATH Google Scholar
Itakura, F.: Line Spectrum Representation of Linear Prediction Coefficients of Speech Signal. J. Acoustic Society of America 57, 535 (1975)
Article Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. J. Acoustic Society of America 87(4), 1738–1752 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, P. R. China, 310027
Chengxi Ye, Jia Liu, Chun Chen, Mingli Song & Jiajun Bu

Authors

Chengxi Ye
View author publications
You can also search for this author in PubMed Google Scholar
Jia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Song
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Bu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering Science, National Cheng Kung University, No.1, University Road, 701, Tainan City, Taiwan
Yueh-Min Ray Huang
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, China
Changsheng Xu
Institute of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Kuo-Sheng Cheng
Department of Electrical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Jar-Ferr Kevin Yang
Department of Electrical and Computer Engineering, Concordia University, S-EV005.139, 1515 St. Catherine West, Montreal, H4G 2W1, Quebec, Canada
M. N. S. Swamy
Microsoft Research Asia, 5/F, Beijing Sigma Center, No. 49, Zhichun Road, Hai Dian District, 100080, Beijing, China
Shipeng Li
Department of Information Management, National Kaohsiung University of Applied Sciences, No. 415, Jiangong Road, Sanmin District, 80778, Kaohsiung, Taiwan
Jen-Wen Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, C., Liu, J., Chen, C., Song, M., Bu, J. (2008). Speech Emotion Classification on a Riemannian Manifold. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-89796-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics