Skip to main content
Log in

Emotion recognition using semi-supervised feature selection with speaker normalization

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Feature selection methods are the mostly used dimensional reduction methods in speech emotion recognition. However, most methods cannot preserve the manifold of data and cannot use the information provided by unlabeled data, so that they cannot select a good sub feature set for speech emotion recognition. This paper presents a semi-supervised feature selection method that can preserve the manifold structure of data, preserve the category structure, and use the information provided by the unlabeled data. To further deal with the manifold of speech data influenced by factors such as emotion, speaker and sentence, a new speaker normalization method is also proposed, which can achieve a good speaker normalization result in the case of a small number of samples of a speaker available. This speaker normalization method can be used in most real application of speech emotion recognition. The conducted experiments validate the proposed semi-supervised feature selection method with the speaker normalization in terms of the effectiveness of the speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech and Language, 25(3), 556–570.

    Article  Google Scholar 

  • Alexei (Alyosha) Efros, Advanced Machine Perception, https://www.cs.cmu.edu/~efros/courses/AP06/presentations/ThompsonDimensionalityReduction.pdf

  • Ayadi, El, Moataz, K., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.

    Article  MATH  Google Scholar 

  • Belkin, M., & Niyogi, P. (2003). Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.

    Article  MATH  Google Scholar 

  • Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7), 613–625.

    Article  Google Scholar 

  • Bozkurt, E., Erzin, E., & Erdem, Çiǧdem Eroǧlu. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9–10), 1186–1197.

    Article  MATH  Google Scholar 

  • Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005). A database of German emotional speech, In Proceedings INTERSPEECH, Lisbon, (pp. 1517–1520).

  • Busso, C., Metallinou, A., & Narayanan, S. S. (2011). Iterative feature normalization for emotional speech detection, In Proceedings IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5692–5695).

  • Cai, D., Zhang, C., & He, X. (2010). Unsupervised feature selection for multi-cluster data, In Proceedings international conference on knowledge discovery and data mining (SIGKDD), (pp. 333–342).

  • Chang, C.-C., & Lin, C.-J. (2011). LIBSVM—A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.

    Article  Google Scholar 

  • Chen, L., Mao, X., Xue, Y., & Cheng, L. L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154–1160.

    Article  MathSciNet  Google Scholar 

  • Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.

    Article  Google Scholar 

  • Haq, S., & Jackson, P. J. B. (2009). Speaker-dependent audio-visual emotion recognition In Proceedings international conference on auditory-visual speech processing (AVSP), (pp. 53–58).

  • Hassan, A., & Damper, R. I. (2012). Classification of emotional speech using 3DEC hierarchical classifier. Speech Communication, 54(7), 903–916.

    Article  Google Scholar 

  • He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection, In Proceedings advances in neural information processing systems(NIPS), (pp. 507–514).

  • He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139–146.

    Article  Google Scholar 

  • Hou, C., Nie, F., & Li, X. (2011). Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Transactions Cybernetics, pp(99), 1–12.

  • Huang, H., Li, J., & Liu, J. (2012). Enhanced semi-supervised local Fisher discriminant analysis for face recognition. Future Generation Computer Systems, 28(1), 244–253.

    Article  MathSciNet  Google Scholar 

  • Iliev, A. I., Scordilis, M. S., Papab, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3), 445–460.

    Article  Google Scholar 

  • Kim, D-S., Jeong, J-H., & Kim, J-W. (1996). Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments, In Proceedings IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 61–64).

  • Krzanowski, W. J. (1987). Selection of variables to preserve multivariate data structure, using principal components. Journal of the Royal Statistical Society. Series C (Applied Statistics), 36(1), 22–33.

    Google Scholar 

  • Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53(9–10), 1162–1171.

    Article  Google Scholar 

  • Li, J.-B., Yang, Z.-M., Yu, Y., & Zhen, S. (2012). Semi-supervised kernel learning based optical image recognition. Optics Communications, 258(18), 3697–3703.

    Article  Google Scholar 

  • López-Cózar, R., Silovsky, J., & Kroul, M. (2011). Enhancement of emotion detection in spoken dialogue systems by combining several information sources. Speech Communication, 27(9–10), 1210–1228.

    Article  Google Scholar 

  • Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions Multimedia, 12(6), 490–501.

    Article  MATH  Google Scholar 

  • Meyer, P., & Bontempi, G. (2006). On the use of variable complementarity for feature selection in cancer classification. In Evolutionary Computation and Machine Learning in Bioinformatics, (pp. 91–102).

  • Ni, D., Sethu, V., Epps, J., & Ambikairajah, E. (2012). Speaker variability emotion recognition - an adaptation based approach, In Proceedings IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5101–5104).

  • Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions Affective Computing, 3(1), 116–125.

    Article  Google Scholar 

  • Nwe, T. L., & Foo, S. W. (2003). Speech emotion recognition using hidden markov models. Speech Communication, 41(4), 603–623.

    Article  Google Scholar 

  • Park, J. S., Kim, J. H., & Oh, Y. H. (2009). Feature vector classification based speech emotion recognition for service robots. IEEE Transactions Consumer Electronics, 55(3), 1590–1596.

    Article  Google Scholar 

  • Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

    Article  Google Scholar 

  • Pérez-Espinosa, H., & Reyes-García, C. A. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.

    Article  Google Scholar 

  • Pérez-Espinosa, H., Reyes-García, C. A., & Villaseñor-Pineda, L. (2012). Acoustic feature selection and classification of emotions in speech using a 3D continuous emotion model. Biomedical Signal Processing and Control, 7(1), 79–87.

    Article  Google Scholar 

  • Pudil, P., Ferri, F. J., Novovicova, J., & Kittler, J. (1994). Floating search methods for feature selection with nonmonotonic criterion functions. Pattern Recognition Letters, 15(11), 1119–1125.

    Article  Google Scholar 

  • Raducanu, B., & Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognition, 45(6), 2432–2444.

    Article  MATH  Google Scholar 

  • Rong, J., Li, G., & Chen, Y.-P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management, 45(3), 315–328.

    Article  Google Scholar 

  • Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  • Schuller, B., Steidl, S., & Batliner, A. (2010). The INTERSPEECH 2010 paralinguistic challenge, In Proceedings INTERSPEECH, (pp. 2794–2797).

  • Schuller, B., Steidl, S., & Batliner, A. (2011). The INTERSPEECH 2011 speaker state challenge feature set, In Proceedings INTERSPEECH.

  • Schuller, B., Steidl, S., & Batliner, A. (2012a). The INTERSPEECH 2012 speaker trait challenge feature set, In Proceedings INTERSPEECH.

  • Schuller, B., Steidl, S., & Batliner, A.. (2009) The INTERSPEECH 2009 emotion challenge feature set, In Proceedings INTERSPEECH, (pp. 983–986).

  • Schuller, B., Steidl, S., & Batliner, A. (2013). The INTERSPEECH 2013 computational paralinguistics challenge feature set, In Proceedings INTERSPEECH, (pp. 148–152).

  • Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, Andreas, et al. (2012b). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions Affective Computing, 1(2), 119–131.

  • Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalization for speech -based emotion detection, In Proceedings IEEE international conference on digital signal processing, Cardiff, (pp. 611–614).

  • Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.

    Article  Google Scholar 

  • Shang, F., & Jiao, L. C. (2013). Semi-supervised learning with nuclear norm regularization. Pattern Recognition, 46(8), 2323–2336.

    Article  Google Scholar 

  • Siqing, W., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 24(7), 768–785.

    Google Scholar 

  • Sugiyama, M. (2006). Local fisher discriminant analysis for supervised dimensionality reduction, In Proceedings international conference on machine learning (ICML), (pp. 905–912).

  • The selected Speech Emotion Database of Institute of Automation Chinese Academy of Sciences (CASIA), http://www.datatang.com/data/39277

  • Vlasenko, B., Schuller, B., & Wendemuth, A. (2007). Frame vs. turn-level: Emotion recognition from speech considering static and dynamic processing, In Proceedings international conference on affective computing and intelligent interaction, (pp. 139–147).

  • Wang, Q., Yuen, P. C., & Feng, G. (2013). Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions. Pattern Recognition, 46(9), 2576–2578.

    Article  Google Scholar 

  • Wu, C.-H., & Liang, W.-B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transactions Affective Computing, 2(1), 10–21.

    Article  MathSciNet  Google Scholar 

  • Xiao, Z., Dellandrea, E., & Chen, L. (2009). Recognition of emotions in speech by a hierarchical approach, In Proceedings ACII, Amsterdam, (pp. 1–8).

  • Yeh, J.-H., Pao, T.-L., Lin, C.-Y., Tsai, Y.-W., & Chen, Y.-T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.

    Article  Google Scholar 

  • Yun, S, & Yoo, C. D. (2012). Loss-scaled large-margin gaussian mixture models for speech emotion classification. IEEE Trans. Audio, Speech, and Language Processing, 20(2), 585–598.

  • Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering, In Proceedings advances in neural information processing systems (NIPS), (pp. 1601–1608).

  • Zhang, S., & Zhao, X. (2013). Dimensionality reduction-based spoken emotion recognition. Multimedia Tools and Applications, 63(3), 615–645.

    Article  Google Scholar 

  • Zhao, Z., & Liu, H. (2007). Spectral feature selection for supervised and unsupervised learning, In Proceedings international conference on machine learning (ICML), (pp. 1151–1157).

  • Zhao, Z., Wang, L., & Liu, H. (2010). Efficient spectral feature selection with minimum redundancy, In Proceedings AAAI conference on artificial intelligence, (pp. 673–678).

  • Zhao, M., Zhang, Z., & Chow, T. W. S. (2012). Trace ratio criterion based generalized discriminative learning for semi-supervised dimensionality reduction. Pattern Recognition, 45(4), 1482–1499.

    Article  MATH  Google Scholar 

  • Zheng, N., & Xue, J. (2009). Manifold learning. Statistical Learning and Pattern Analysis for Image and Video Processing, 87–119.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaxin Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Wen, G. Emotion recognition using semi-supervised feature selection with speaker normalization. Int J Speech Technol 18, 317–331 (2015). https://doi.org/10.1007/s10772-015-9272-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9272-x

Keywords

Navigation