Dimensionality reduction-based spoken emotion recognition

Zhang, Shiqing; Zhao, Xiaoming

doi:10.1007/s11042-011-0887-x

Dimensionality reduction-based spoken emotion recognition

Published: 04 October 2011

Volume 63, pages 615–646, (2013)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shiqing Zhang¹ &
Xiaoming Zhao²

603 Accesses
26 Citations
Explore all metrics

Abstract

To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised manifold learning algorithm for nonlinear dimensionality reduction, called modified supervised locally linear embedding algorithm (MSLLE) is proposed for spoken emotion recognition. MSLLE aims at enlarging the interclass distance while shrinking the intraclass distance in an effort to promote the discriminating power and generalization ability of low-dimensional embedded data representations. To compare the performance of MSLLE, not only three unsupervised dimensionality reduction methods, i.e., principal component analysis (PCA), locally linear embedding (LLE) and isometric mapping (Isomap), but also five supervised dimensionality reduction methods, i.e., linear discriminant analysis (LDA), supervised locally linear embedding (SLLE), local Fisher discriminant analysis (LFDA), neighborhood component analysis (NCA) and maximally collapsing metric learning (MCML), are used to perform dimensionality reduction on spoken emotion recognition tasks. Experimental results on two emotional speech databases, i.e. the spontaneous Chinese database and the acted Berlin database, confirm the validity and promising performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EMG-based speech recognition using dimensionality reduction methods

Article 23 May 2021

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

Article 22 April 2023

A comparative analysis of classifiers in emotion recognition through acoustic features

Article 15 June 2014

References

Ang J, Dhillon R, Krupski A, Shriberg E, Stolcke A (2002) Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: 7th International Conference on Spoken Language Processing (ICSLP’02), Denver, Colorado, pp. 2037–2040
Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70:614–636. doi:10.1037/0022-3514.70.3.614
Article Google Scholar
Batliner A, Buckow A, Niemann H, Noth E, Warnke V (2000) The prosody module. VERBMOBIL: foundations of speech-to-speech translations: 106–121
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2011) Whodunnit–searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28. doi:10.1016/j.csl.2009.12.003
Article Google Scholar
Bengio Y, Paiement J, Vincent P, Delalleau O, Le Roux N, Ouimet M (2004) Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: Advances in Neural Information Processing Systems, vol 16. MIT Press, Cambridge, MA, USA
Boersma P, Weenink D (2009) Praat: doing phonetics by computer (version 5.1.05) [computer program]. Retrieved May 1, 2009, from http://www.praat.org/
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Interspeech-2005, Lisbon, Portugal, pp. 1–4
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Ling 22(2):249–254
Google Scholar
Chang Y, Hu C, Feris R, Turk M (2006) Manifold based analysis of facial expression. Image Vis Comput 24(6):605–614. doi:10.1016/j.imavis.2005.08.006
Article Google Scholar
Chang C, Lin C (2001) LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Comm 40(1–2):5–32. doi:10.1016/S0167-6393(02)00071-7
Article MATH Google Scholar
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18(1):32–80. doi:10.1109/79.911197
Article Google Scholar
Daza-Santacoloma G, Acosta-Medina C, Castellanos-Domínguez G (2010) Regularization parameter choice in locally linear embedding. Neurocomputing 73(10–12):1595–1605. doi:10.1016/j.neucom.2009.11.038
Article Google Scholar
de Ridder D, Duin R (2002) Locally linear embedding for classification. Pattern Recognition Group, Dept of Imaging Science & Technology, Delft University of Technology, Delft, The Netherlands, Tech Rep PH-2002-01
de Ridder D, Kouropteva O, Okun O, Pietikainen M, Duin R (2003) Supervised locally linear embedding. In: Artificial Neural Networks and Neural Information Processing-ICANN/ICONIP 2003, Lecture Notes in Computer Science 2714, vol 2714. Springer, pp 333–341
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: 4th International Conference on Spoken Language Processing (ICSLP’96), Philadelphia, PA, USA, pp. 1970–1973
Ekman P (1992) An argument for basic emotions. Cognit Emot 6(3):169–200. doi:10.1080/02699939208411068
Article Google Scholar
Errity A, McKenna J (2006) An investigation of manifold learning for speech analysis. In: Ninth International Conference on Spoken Language Processing (ICSLP’06), Pittsburgh, PA, USA, pp. 2506–2509
Fernandez R, Picard R (2003) Modeling drivers’ speech under stress. Speech Comm 40(1–2):145–159. doi:10.1016/S0167-6393(02)00080-8
Article MATH Google Scholar
Fisher R (1936) The use of multiple measures in taxonomic problems. Ann Eugenics 7:179–188
Article Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic, Boston
MATH Google Scholar
Globerson A, Roweis S (2006) Metric learning by collapsing classes. In: Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 451–458
Gobl C, Ni Chasaide A (2003) The role of voice quality in communicating emotion, mood and attitude. Speech Comm 40(1–2):189–212. doi:10.1016/S0167-6393(02)00082-1
Article MATH Google Scholar
Goddard J, Schlotthauer G, Torres M, Rufiner H (2009) Dimensionality reduction for visualization of normal and pathological speech data. Biomed Signal Process Contr 4(3):194–201. doi:10.1016/j.bspc.2009.01.001
Article Google Scholar
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2005) Neighbourhood components analysis. In: Advances in Neural Information Processing Systems (NIPS), vol 17. MIT Press, Cambridge, MA, pp 513–520
He X, Niyogi P (2003) Locality preserving projections. In: Advances in neural information processing systems (NIPS), vol 16. MIT Press, Cambridge, MA, pp 153–160
Hozjan V, Kacic Z (2003) Improved emotion recognition with large set of statistical features. In: EUROSPEECH-2003, Geneva, pp. 133–136
Hsu C, Chang C, Lin C (2003) A practical guide to support vector classification. Tech. Rep. Taipei
Iliev A, Scordilis M, Papa J, Falcao A (2010) Spoken emotion recognition through optimum-path forest classification using glottal features. Comput Speech Lang 24(3):445–460. doi:10.1016/j.csl.2009.02.005
Article Google Scholar
Iliev A, Zhang Y, Scordilis M (2007) Spoken emotion classification using ToBI features and GMM. In: IEEE 6th EURASIP Conference Focused on Speech and Image Processing, Maribor, Slovenia, pp. 495–498
Jain V, Saul L (2004) Exploratory analysis and visualization of speech and music by locally linear embedding. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Montreal, Canada, pp. 984–987
Jansen A, Niyogi P (2005) A geometric perspective on speech sounds. University of Chicago, Tech Rep
Jansen A, Niyogi P (2006) Intrinsic fourier analysis on the manifold of speech sounds. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06), Toulouse, France, pp. 241–244
Johnstone T, Scherer K (1999) The effects of emotions on voice quality. In: XIVth International Congress of Phonetic Science, San Francisco, pp. 2029–2032
Jolliffe IT (1986) Principal component analysis, 2nd edn. Springer, New York
Book Google Scholar
Kayo O, Design C, Ahonen R (2006) Locally linear embedding algorithm extensions and applications. Faculty of Technology, University of Oulu
Kim J, Lee S, Narayanan S (2010) An exploratory study of manifolds of emotional speech In: 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’2010), Dallas, Texas, USA, pp. 5142–5145
Kouropteva O, Okun O, Pietikainen M (2003) Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine. In: 11th European Symposium on Artificial Neural Networks Bruges, Belgium, pp. 229–234
Kwon O, Chan K, Hao J, Lee T (2003) Emotion recognition by speech signals. In: EUROSPEECH-2003, Geneva, Switzerland, pp. 125–128
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Audio Speech Lang Process 13(2):293–303. doi:10.1109/TSA.2004.838534
Article Google Scholar
Lee CM, Narayanan SS, Pieraccini R (2001) Recognition of negative emotions from the speech signal. In: IEEE Workshop Automatic Speech Recognition and Understanding (ASRU), Trento, pp. 240–243
Lee C, Narayanan S, Pieraccini R (2002) Combining acoustic and language information for emotion recognition. In: 7th International Conference on Spoken Language Processing (ICSLP’02), Denver, Colorado, USA, pp. 873–876
Lee C, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S (2004) Emotion recognition based on phoneme classes. In: International Conference on Spoken Language Processing (ICSLP’04), Jeju, Korea, pp. 889–892
Li B, Zheng C-H, Huang D-S (2008) Locally linear discriminant embedding: an efficient method for face recognition. Pattern Recogn 42(12):3813–3821. doi:10.1016/j.patcog.2008.05.027
Article Google Scholar
Liang D, Yang J, Zheng Z, Chang Y (2005) A facial expression recognition system based on supervised locally linear embedding. Pattern Recogn Lett 26(15):2374–2389. doi:10.1016/j.patrec.2005.04.011
Article Google Scholar
Monzo C, Alías F, Iriondo I, Gonzalvo X, Planet S (2007) Discriminating expressive speech styles by voice quality parameterization. In: 16th International Congress of Phonetic Sciences, Saarbruken, Germany, pp. 2081–2084
Morrison D, Wang R, De Silva LC (2007) Ensemble methods for spoken emotion recognition in call-centres. Speech Comm 49(2):98–112. doi:10.1016/j.specom.2006.11.004
Article Google Scholar
Nicholson J, Takahashi K, Nakatsu R (2000) Emotion recognition in speech using neural networks. Neural Comput Appl 9(4):290–296. doi:10.1007/s005210070006
Article MATH Google Scholar
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Comm 41(4):603–623. doi:10.1016/s01167-6393(03)00099-2
Article Google Scholar
Osgood C, May W, Miron M (1975) Cross-cultural universals of affective meaning. University of Illinois Press
Pao T, Chen Y, Yeh J, Liao W (2005) Combining acoustic features for improved emotion recognition in Mandarin speech. In: Affective Computing and Intelligent Interaction. pp 279–285
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Phil Mag 2(6):559–572
Google Scholar
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: Proc. 1999 Artificial Neural Networks in Engineering (ANNIE ’99), New York, pp. 7–10
Petrushin V (2000) Emotion recognition in speech signal: experimental study, development, and application. In: 6th International Conference on Spoken Language Processing (ICSLP’00), Beijing, China, pp. 222–225
Picard R (1997) Affective computing. MIT, Cambridge
Google Scholar
Picard R (2001) Affective medicine: technology with emotional intelligence. Future of health technology. OIS, Cambridge, pp 69–85
Google Scholar
Picard R, Klein J (2002) Computers that recognise and respond to user emotion: theoretical and practical implications. Interact Comput 14(2):141–169. doi:10.1016/S0953-5438(01)00055-8
Article Google Scholar
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning. MIT press, Cambridge, MA, USA, pp 185–208
Rong J, Li G, Chen Y-PP (2009) Acoustic feature selection for automatic emotion recognition from speech. Inform Process Manag 45(3):315–328
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. doi:10.1126/science.290.5500.2323
Article Google Scholar
Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4:119–155
MathSciNet Google Scholar
Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Comm 40(1–2):227–256. doi:10.1016/S0167-6393(02)00084-5
Article MATH Google Scholar
Scherer S, Schwenker F, Palm G (2009) Classifier fusion for emotion recognition from speech. In: Advanced Intelligent Environments. Springer, pp 95–117
Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: INTERSPEECH-2007, Antwerp, Belgium, pp. 2253–2256
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, pp. 577–580
Schuller B, Seppi D, Batliner A, Maier A, Steidl S (2007) Towards more reality in the recognition of emotional speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’07), Honolulu, Hawai’i, USA, pp. 941–944
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Comm 49(3):201–212. doi:10.1016/j.specom.2007.01.006
Article Google Scholar
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 8:1027–1061
MATH Google Scholar
Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323. doi:10.1126/science.290.5500.2319
Article Google Scholar
Valencia-Aguirre J, Álvarez-Mesa A, Daza-Santacoloma G, Castellanos-Domínguez G (2009) Automatic choice of the number of nearest neighbors in locally linear embedding. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 77–84
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Van der Maaten L, Postma E, Van den Herik H (2009) Dimensionality reduction: a comparative review. TiCC TR 2009–005
Vapnik V (2000) The nature of statistical learning theory. Springer-Verlag, New York
MATH Google Scholar
Ververidis D, Kotropoulos C (2005) Emotional speech classification using Gaussian mixture models. In: IEEE International Conference on Multimedia and Expo (ICME’05), Amsterdam, The Netherlands, pp. 2871–2874
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Comm 48(9):1162–1181. doi:10.1016/j.specom.2006.04.003
Article Google Scholar
Ververidis D, Kotropoulos C (2008) Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Process 88(12):2956–2970. doi:10.1016/j.sigpro.2008.07.001
Article MATH Google Scholar
Ververidis D, Kotropoulos C, Pitas I (2004) Automatic emotional speech classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’04), Montreal, Quebec, Canada, pp. 593–596
Wang Y, Guan L (2004) An investigation of speech-based human emotion recognition. In: IEEE 6th Workshop on Multimedia Signal Processing, Siena, Italy pp. 15–18
Wang M, Yang J, Xu Z, Chou K (2005) SLLE for predicting membrane protein types. J Theor Biol 232(1):7–15. doi:10.1016/j.jtbi.2004.07.023
Article MathSciNet Google Scholar
Xiao Z, Dellandrea E, Dou W, Chen L (2010) Multi-stage classification of emotional speech motivated by a dimensional emotion model. Multimed Tool Appl 46(1):119–145. doi:10.1007/s11042-009-0319-3
Article Google Scholar
Yildirim S, Narayanan S, Potamianos A (2011) Detecting emotional state of a child in a conversational computer game. Comput Speech Lang 25(1):29–44. doi:10.1016/j.csl.2009.12.004
Article Google Scholar
You M, Chen C, Bu J, Liu J, Tao J (2006) Emotional speech analysis on nonlinear manifold. In: 18th International Conference on Pattern Recognition (ICPR 2006), Hong Kong, pp. 91–94
You M, Chen C, Bu J, Liu J, Tao J (2007) Manifolds based emotion recognition in speech. Comput Ling Chin Lang Process 12(1):49–64
Google Scholar
Zhang S (2008) Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In: Advances in Neural Networks–ISNN 2008, Lecture Notes in Computer Science 5264, vol 5264. Springer, pp 457–464
Zhao L, Zhang Z (2009) Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 57(6):919–926. doi:10.1016/j.camwa.2008.10.055
Article MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank all the anonymous reviewers and editors for their helpful comments and suggestions about the improvement of this paper. This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and No. Y1111058.

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Taizhou University, Taizhou, 318000, People’s Republic of China
Shiqing Zhang
Department of Computer Science, Taizhou University, Taizhou, 318000, People’s Republic of China
Xiaoming Zhao

Authors

Shiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiqing Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Zhao, X. Dimensionality reduction-based spoken emotion recognition. Multimed Tools Appl 63, 615–646 (2013). https://doi.org/10.1007/s11042-011-0887-x

Download citation

Published: 04 October 2011
Issue Date: April 2013
DOI: https://doi.org/10.1007/s11042-011-0887-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dimensionality reduction-based spoken emotion recognition

Abstract

Access this article

Similar content being viewed by others

EMG-based speech recognition using dimensionality reduction methods

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

A comparative analysis of classifiers in emotion recognition through acoustic features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dimensionality reduction-based spoken emotion recognition

Abstract

Access this article

Similar content being viewed by others

EMG-based speech recognition using dimensionality reduction methods

Automatic speech emotion recognition based on hybrid features with ANN, LDA and K_NN classifiers

A comparative analysis of classifiers in emotion recognition through acoustic features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation