Spoken emotion recognition via locality-constrained kernel sparse representation

Zhao, Xiaoming; Zhang, Shiqing

doi:10.1007/s00521-014-1755-1

Spoken emotion recognition via locality-constrained kernel sparse representation

Original Article
Published: 29 October 2014

Volume 26, pages 735–744, (2015)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xiaoming Zhao¹ &
Shiqing Zhang¹

402 Accesses
9 Citations
Explore all metrics

Abstract

Spoken emotion recognition is currently a very active research topic and has attracted extensive attention in signal processing, pattern recognition, artificial intelligence, etc. In this paper, a new emotion classification method based on kernel sparse representation, named locality-constrained kernel sparse representation-based classification (LC-KSRC), is proposed for spoken emotion recognition. LC-KSRC is able to learn more discriminating sparse representation coefficients for spoken emotion recognition, since it integrates both sparsity and data locality in the kernel feature space. The proposed method is compared with six representative emotion classification methods, including linear discriminant classifier, K-nearest-neighbor, radial basis function neural networks, support vector machines, sparse representation-based classification and kernel sparse representation-based classification. Experimental results on two publicly available emotional speech databases, i.e., the Berlin database and the Polish database, demonstrate the promising performance of the proposed method on spoken emotion recognition tasks, outperforming the other used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition

A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding

Speech Emotion Recognition Using Regularized Discriminant Analysis

References

Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor JG (2001) Emotion recognition in human–computer interaction. IEEE Signal Process Mag 18(1):32–80
Article Google Scholar
Lee CM, Narayanan SS (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech and Audio Process 13(2):293–303
Article Google Scholar
Busso C, Sungbok L, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17(4):582–596
Article Google Scholar
Luengo I, Navas E, Hernaez I (2010) Feature analysis and evaluation for automatic emotion identification in speech. IEEE Trans Multimedia 12(6):490–501
Article Google Scholar
Dromey C, Silveira J, Sandor P (2005) Recognition of affective prosody by speakers of English as a first or foreign language. Speech Commun 47(3):351–359
Article Google Scholar
Batliner A, Steidl S, Schuller B, Seppi D, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L, Aharonson V (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25(1):4–28
Article Google Scholar
Jaywant A, Pell MD (2012) Categorical processing of negative emotions from speech prosody. Speech Commun 54(1):1–10
Article Google Scholar
Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612
Article Google Scholar
van der Wal CN, Kowalczyk W (2013) Detecting changing emotions in human speech by machine and humans. Appl Intell 39(4):675–691
Article Google Scholar
Gobl C, Chasaide NA (2003) The role of voice quality in communicating emotion, mood and attitude. Speech Commun 40(1–2):189–212
Article MATH Google Scholar
Zhang S (2008) Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In: Advances in neural networks—ISNN 2008, Lecture Notes in Computer Science 5264, vol 5264. Springer, Berlin, pp 457–464
Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: INTERSPEECH-2007, Antwerp, Belgium, pp 2253–2256
Nwe TL, Foo SW, De Silva LC (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623
Article Google Scholar
Kienast M, Sendlmeier W (2000) Acoustical analysis of spectral and temporal changes in emotional speech. ITRW on Speech and Emotion. Newcastle, Northern Ireland, pp 92–97
Google Scholar
Bitouk D, Verma R, Nenkova A (2010) Class-level spectral features for emotion recognition. Speech Commun 52(7–8):613–625
Article Google Scholar
Sheikhan M, Gharavian D, Ashoftedel F (2012) Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Comput Appl 21(7):1765–1773
Article Google Scholar
Hu H, Xu MX, Wu W (2007) GMM supervector based SVM with spectral features for speech emotion recognition. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’07), Honolulu, HI, pp 413–416
Petrushin V (1999) Emotion in speech: recognition and application to call centers. In: 1999 Artificial neural networks in engineering (ANNIE ‘99), New York, pp 7–10
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: 4th International conference on spoken language processing (ICSLP’96), Philadelphia, PA, pp 1970–1973
Nicholson J, Takahashi K, Nakatsu R (2000) Emotion recognition in speech using neural networks. Neural Comput Appl 9(4):290–296
Article MATH Google Scholar
Petrushin V (2000) Emotion recognition in speech signal: experimental study, development, and application. In: 6th International conference on spoken language processing (ICSLP’00), Beijing, pp 222–225
Schuller B, Rigoll G, Lang M (2004) Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), Montreal, pp 577–580
Kwon O, Chan K, Hao J, Lee T (2003) Emotion recognition by speech signals. In: EUROSPEECH-2003, Geneva, pp 125–128
Ververidis D, Kotropoulos C (2005) Emotional speech classification using Gaussian mixture models. In: IEEE international conference on multimedia and expo (ICME’05), Amsterdam, pp 2871–2874
Iliev A, Zhang Y, Scordilis M (2007) Spoken emotion classification using ToBI features and GMM. In: IEEE 6th EURASIP conference focused on speech and image processing, Maribor, Slovenia, pp 495–498
Lee C, Yildirim S, Bulut M, Kazemzadeh A, Busso C, Deng Z, Lee S, Narayanan S (2004) Emotion recognition based on phoneme classes. In: International conference on spoken language processing (ICSLP’04), Jeju, Korea, pp 889–892
Donoho DL (2006) For most large underdetermined systems of linear equations the minimal l 1-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829
Article MATH MathSciNet Google Scholar
Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306
Article MATH MathSciNet Google Scholar
Candes EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30
Article Google Scholar
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Article Google Scholar
Zhao X, Zhang S, Lei B (2014) Robust emotion recognition in noisy speech via sparse representation. Neural Comput Appl 24(7–8):1539–1553
Article Google Scholar
Zhang L, Zhou W-D, Chang P-C, Liu J, Yan Z, Wang T, Li F-Z (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
Article MathSciNet Google Scholar
Zhou Y, Gao J, Barner KE (2012) An enhanced sparse representation strategy for signal classification. In: SPIE 8365, compressive sensing, Baltimore, MD, p 83650H
Yin J, Liu Z, Jin Z, Yang W (2012) Kernel sparse representation based classification. Neurocomput 77(1):120–128
Article Google Scholar
Gao S, Tsang IW-H, Chia L-T (2010) Kernel sparse representation for image classification and face recognition. In: Computer vision—ECCV 2010. Lecture notes in computer science. Springer, Crete, pp 1–14
Gao S, Tsang IW-H, Chia L-T (2013) Sparse representation with kernels. IEEE Trans Image Process 22:423–434
MathSciNet Google Scholar
Muller K, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Article Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Yu K, Zhang T, Gong Y (2009) Nonlinear learning using local coordinate coding. Adv Neural Inf Process Syst 22:2223–2231
Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR’2010), San Francisco, pp 3360–3367
Candes E, Romberg J (2005) l1-magic: recovery of sparse signals via convex programming. http://users.ece.gatech.edu/~justin/l1magic/
Kim SJ, Koh K, Lustig M, Boyd S, Gorinevsky D (2007) An interior-point method for large-scale l1-regularized least squares. IEEE J Select Top Signal Process 1(4):606–617
Article Google Scholar
Van Den Berg E, Friedlander MP (2008) Probing the Pareto frontier for basis pursuit solutions. SIAM J Sci Comput 31(2):890–912
Article MATH MathSciNet Google Scholar
Becker S, Bobin J, Candès EJ (2011) NESTA: a fast and accurate first-order method for sparse recovery. SIAM J Imag Sci 4(1):1–39
Article MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B (Methodological) 58(1):267–288
Schmidt MW, Murphy KP, Fung G, Rosales R (2008) Structure learning in random fields for heart motion abnormality detection. In: IEEE conference on computer vision and pattern recognition (CVPR’08) Anchorage, pp 1–8
Tropp JA, Wright SJ (2010) Computational methods for sparse solution of linear inverse problems. Proc IEEE 98(6):948–958
Article Google Scholar
Scholkopf B (2001) The kernel trick for distances. Adv Neural Inf Process Syst 301–307
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Interspeech-2005, Lisbon, pp 1–4
Cichosz J, Slot K (2005) Application of selected speech-signal characteristics to emotion recognition in polish language. In: International conference on signals and electronic systems, Poznan, pp 409–412
Zhang S, Zhao X (2013) Dimensionality reduction-based spoken emotion recognition. Multimedia Tool Appl 63(3):615–646
Article Google Scholar
Chang C, Lin C (2001) LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Fersini E, Messina E, Archetti F (2012) Emotional states in judicial courtrooms: an experimental investigation. Speech Commun 54:11–22
Article Google Scholar
Albornoz EM, Milone DH, Rufiner HL (2011) Spoken emotion recognition using hierarchical classifiers. Comput Speech Lang 25(3):556–570
Article Google Scholar
Gharavian D, Sheikhan M, Nazerieh A, Garoucy S (2012) Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput Appl 21(8):2115–2126
Article Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: The twentieth international conference on machine learning (ICML-2003), Washington, pp 856–863
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Commun 49(3):201–212
Article Google Scholar
Scherer S, Schwenker F, Palm G (2009) Classifier fusion for emotion recognition from speech. Adv Intel Environ 95–117
Cichosz J, Slot K (2005) Low-dimensional feature space derivation for emotion recognition. In: INTERSPEECH-2005, Lisbon, pp 477–480

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant No. 61203257 and No. 61272261.

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Taizhou University, Taizhou, 318000, People’s Republic of China
Xiaoming Zhao & Shiqing Zhang

Authors

Xiaoming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shiqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiqing Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Zhang, S. Spoken emotion recognition via locality-constrained kernel sparse representation. Neural Comput & Applic 26, 735–744 (2015). https://doi.org/10.1007/s00521-014-1755-1

Download citation

Received: 16 January 2014
Accepted: 16 October 2014
Published: 29 October 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s00521-014-1755-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spoken emotion recognition via locality-constrained kernel sparse representation

Abstract

Access this article

Similar content being viewed by others

Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition

A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding

Speech Emotion Recognition Using Regularized Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spoken emotion recognition via locality-constrained kernel sparse representation

Abstract

Access this article

Similar content being viewed by others

Semi-Supervised Dictionary Learning of Sparse Representations for Emotion Recognition

A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding

Speech Emotion Recognition Using Regularized Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation