Abstract
In practice, the training data and testing data are often from different datasets, which have an adverse impact on speech emotion recognition rates. To tackle this problem, in this paper, a novel transfer principal component analysis (TPCA) and sparse coding based speech emotion recognition method is proposed. The TPCA approach is first presented for feature dimension reduction, then the sparse coding algorithm is introduced to learn the robust feature representations for both labeled source and unlabeled target corpora. To evaluate the performance of our proposed method, the experiments are conducted on two public datasets. Experimental results demonstrate that our proposed approach significantly outperforms the automatic recognition method, and obtains better performance than the state-of-the-art method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases. Pattern Recognition 44, 572–587 (2011)
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 3687–3691. IEEE Press, Vancovour (2013)
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus Acoustic Emotion Recognition: Variances and Strategies. IEEE Transactions on Affective Computing 1, 119–131 (2010)
Zhang, Z., Weninger, F., Wollmer, M.: Unsupervised learning in cross-corpus acoustic emotion recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 523–528, IEEE Press, Hawaii (2011)
Deng, J., Zhang, Z., Eyben, F., Schuller, B.: Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition. IEEE Signal Processing Letters 21, 1068–1072 (2014)
Song, P., Jin, Y., Zhao, L., Xin, M.: Speech Emotion Recognition Using Transfer Learning. IEICE TRANSACTIONS on Information and Systems 97, 2530–2532 (2014)
Bishop, C.M.: Pattern Recognition and Machine Learning. springer, New York (2006)
Gretton, A., Borgwardt, K.M., Rasch, M., Schlkopf, B., Smola, A.J.: A kernel method for the two-sample-problem. In: Advances in Neural Information Processing Systems, pp. 513–520. NIPS Foundation, Vancovour (2006)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust Face Recognition Via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 210–227 (2009)
Zheng, M., Bu, J., Chen, C., Wang, C., Zhang, L., Qiu, G., Cai, D.: Graph Regularized Sparse Coding for Image Representation. IEEE Transactions on Image Processing 20, 1327–1336 (2011)
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808. NIPS Foundation, Vancovour (2006)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of german emotional speech. In: Interspeech, pp. 1517–1520. ISCA, Lisbon (2005)
Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE 2005 audio-visual emotion database. In: International Conference on Data Engineering Workshops, pp. 8–8. IEEE Press, Atlanta (2006)
Eyben, F., Wöllmer, M., Schuller B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: ACM Multimedia, pp. 1459–1462. ACM Press, Firenze (2010)
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Muller, C.A., Narayanan S.S.: The interspeech 2010 paralinguistic challenge. In: Interspeech, pp. 2794–2797. ISCA, Makuhari (2010)
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain Adaptation via Transfer Component Analysis. IEEE Transactions on Neural Networks 22, 199–210 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Song, P., Zheng, W., Liu, J., Li, J., Zhang, X. (2015). A Novel Speech Emotion Recognition Method via Transfer PCA and Sparse Coding. In: Yang, J., Yang, J., Sun, Z., Shan, S., Zheng, W., Feng, J. (eds) Biometric Recognition. CCBR 2015. Lecture Notes in Computer Science(), vol 9428. Springer, Cham. https://doi.org/10.1007/978-3-319-25417-3_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-25417-3_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25416-6
Online ISBN: 978-3-319-25417-3
eBook Packages: Computer ScienceComputer Science (R0)