Abstract
We present a novel algorithm for speech emotion classification. In contrast to previous methods, we additionally consider the relations between simple features by incorporating covariance matrices as the new feature descriptors. Since non-singular covariance matrices do not lie on a linear space, we endow the space with an affine invariance metric and render it into a Riemannian manifold. After that we use the tangent space to approximate the manifold. Classification is performed in the tangent space and a generalized principal component analysis is presented. We test the algorithm on speech emotion classification and the experiment results show an improvement at around 13%(+3% with PCA) in recognition accuracy. Based on that we are able to train one simple model to accurately differentiate the emotions from both genders.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bezooijen, R.V.: The Characterisitcs and Recognizability of Vocal Expression of Emotions. Foris, Drodrecht (1984)
Cowie, R., Cowie, E.D., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processing Magazine 18(1), 32–80 (2001)
Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional Space Improves Emotion Recognition. In: Proc. Int’l Conf. on Spoken Language Processing, pp. 2029–2032 (2002)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov Model-based Speech Emotion Recognition. In: Proc. European Conf. on Speech Communication and Technology, pp. 401–405 (2003)
Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based Classification of Emotions in Spoken Finnish. In: Proc. European Conf. on Speech Communication and Technology, pp. 717–720 (2003)
Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proc. IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, pp. 593–596 (2004)
Chateau, N., Maffiolo, V., Blouin, C.: Analysis of Emotional Speech in Voice Mail Messages: the Influence of Speakers Gender. In: Proc. Int’l Conf. on Spoken Language Processing, pp. 885–888 (2004)
Chuang, Z.J., Wu, C.H.: Emotion Recognition using Acoustic Features and Textual Content. In: Proc. IEEE Int’l Conf. on Multimedia and Expo., vol. 1, pp. 53–56 (2004)
Park, C.H., Heo, K.S., Lee, D.W., Joo, Y.H., Sim, K.B.: Emotion Recognition based on Frequency Analysis of Speech Signal. Int’l Journal of Fuzzy Logic and Intelligent Systems, 122–126 (2002)
Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion Recognition by Speech Signals. In: Proc. European Conf. on Speech Communication and Technology, pp. 125–128 (2003)
Pao, T.L., Chen, Y.T., Yeh, J.H.: Emotion Recognition from Mandarin Speech Signals. In: Proc. Int’l Symposium on Chinese Spoken Language Processing, pp. 301–304 (2004)
Nwe, T.L., Foo, S.W., De Silva, L.C.: Detection of Stress and Emotion in Speech Using Traditional and FFT Based Log Energy Features. In: Proc. Pacific Rim Conference on Multimedia, vol. 3, pp. 1619–1623 (2003)
Lippman, R.: Speech Recognition by Machines and Humans. Speech Communication 22(1), 1–15 (1997)
Murray, I., Arnott, J.: Toward a Simulation of Emotion in Synthetic Speech: a Review of the Literature on Human Vocal Emotion. J. Acoustic Society of America 93(2), 1097–1108 (1993)
Tuzel, O., Porikli, F., Meer, P.: Region Covariance: a Fast Descriptor for Detection and Classification. In: Proc. European Conf. on Computer Vision, vol. 2, pp. 589–600 (2006)
Porikli, F., Tuzel, O., Meer, P.: Covariance Tracking using Model Update based on Lie Algebra. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 728–735 (2006)
Tuzel, O., Porikli, F., Meer, P.: Human Detection via Classification on Riemannian Manifolds. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Carmo, M.P.D.: Differential Geometry of Curves and Surfaces. Prentice-Hall, Inc., Englewood Cliffs (1976)
Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Intl. J. of Computer Vision 66(1), 41–66 (2006)
Fletcher, P.T., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process 87(2), 250–262 (2007)
Itakura, F.: Line Spectrum Representation of Linear Prediction Coefficients of Speech Signal. J. Acoustic Society of America 57, 535 (1975)
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. J. Acoustic Society of America 87(4), 1738–1752 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ye, C., Liu, J., Chen, C., Song, M., Bu, J. (2008). Speech Emotion Classification on a Riemannian Manifold. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-89796-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)