Abstract
Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple feature extraction methods and i-vector modeling technique in order to improve emotional speaker recognition under real conditions. The performance of the proposed approach is evaluated on real condition speech signal (IEMOCAP corpus) under clean and noisy environments using various SNR levels. We examined divers known spectral features in speaker recognition (MFCC, LPCC and RASTA-PLP) and performed combined features called MFCC-SDC coefficients. The feature vectors are then classified using the multiclass Support Vector Machines (SVM). Experimental results illustrate good robustness of the proposed system against talking conditions (emotions) and against real life environment (noise). Besides, results reveal that MFCC-SDC features outperforms the conventional MFCCs.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-018-6256-2/MediaObjects/11042_2018_6256_Fig8_HTML.png)
Similar content being viewed by others
References
Ayadi ME, Kamel MS, Karray F (2011) Survey on speech emotion recognition :features classification schemes and databases. J Patt Recog 572–587
Boulianne G Kenny, P (2005) Eigenvoice modeling with sparse training data. IEEE Trans Speech Audio Proc
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) IEMOCAP: interactive emotional dyadic motion capture database. J Langua Resou Eval 42(4):335–359
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Sign Proc ASSP-28(4):357–366
Dhonde SB, SM Jagade (2015) Feature Extraction Techniques in Speaker Recognition: A Review, International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), ISSN: 2349–7947, 2
Hermansky H, Morgan H (1992) RASTA-PLP speech analysis technique. IEEE Int Conf Acoust Speech Sign Proc 1:121124
Hsu C, Lin C (2001) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415425
Kenny P, Dehak N, Dehak R (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. Interspeech
Mackov L (2014) Emotional speaker veri_cation based on i-vectors. IEEE International Conference
Mackov L et al (2015) Best feature selection for emotional speaker verification in i-vector representation
Murali Krishna N, Lakshmi PV, Srinivas Y (2013) Inferring the Human Emotional State of Mind using Assymetric Distrubution, International Journal of Advanced Computer Science and Applications(IJACSA) 4;1
Ouellet P, Dumouchel P, Kenny P, Boulianne G (2007) Joint factor analysis versus eigen channels in speaker recognition. IEEE Trans
Prithvi P, Kumar TK (2015) Comparative Analysis of MFCC, LFCC, RASTA -PLP, International Journal of Scientific Engineering and Research (IJSER) 2347–3878
Quatieri TF, Reynolds DA, Dunn RB (2000) Speaker verification using adapted gaussian mixture models. Digital signal processing
Richardson F, Reynolds D, Dehak N (2015) A Uni_ed Deep Neural Network for Speaker and Language Recognition, International Speech Communication Association, INTERSPEECH, September 6–10, 2015, Dresden, Germany
Rusu C, Ghiurcau MV, Astola J (2011) Speaker recognition in an emotional environment. Proceedings of SPAMEC
Sarmah K, Bhattacharjee U (2014) GMM based Language Identification using MFCC and SDC Features. Int J Comput Appl 85(5)
Shahin I (2009) Speaker Identification in Emotional Environments. Iranian J Electric Computer Eng 8 (1)
Shahin I (2013) Speaker identification in emotional talking environments using both gender and emotion cues. IEEE
Shashidhar G, Koolagudi K (2012) Sreenivasa Rao. Emotion recognition from speech : a review. Springer Science
Sirisha Devi J, Yarramalle S, Nandyala SP (2014) Speaker emotion recognition based on speech features and classification techniques, I.J. Comput Netw Info Sec 7:61–77
Sreenivasa Rao K, Koolagudi SG, Sharma K (2012) Speaker recognition in emotional environment. Communications in Computer and Information
Tao D, Guo Y, Song M (2016) Person Re-Identification by Dual-RegularizedKISS Metric Learning. IEEE Transa Image Proc 25(6)
Tao D, Cheng J, Song M (2016) Manifold Ranking-Based Matrix Factorization for Saliency Detection. IEEE Trans Neural Netw Learn Syst 27(6)
Tao D, Guo Y, Li Y, Gao X (2018) Tensor Rank Preserving Discriminant Analysis for Facial Recognition. IEEE Trans Image Proc 27(1)
Van Leeuwen D, Hasan Bahari M, Saeidi R (2013) Accent recognition using i-vector. ICASSP
Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience, New York
Xia R, Liu Y (2012) Using i-vector space model for emotion recognition. INTERSPEECH
Xu M, Bao H, Zheng TF (2007) Emotion attribute projection for speaker recognition on emotional speech. in Proceedings of Interspeech
Yang Y, Li C (2011) Emotional Speaker Identification by Humans and Machines, Speech Commun: Springer. 167173
Yang Y, Li C (2013) Emotional speaker recognition based on i-vector through atom aligned sparse representation
Yeh J-H, Pao T-L, Lin C-Y, Tsai Y-W, Chen Y-T (2011) Segment-based emotion recognition from continuous mandarin Chinese speech. Comput Hum Behav 27:1545–1552
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mansour, A., Chenchah, F. & Lachiri, Z. Emotional speaker recognition in real life conditions using multiple descriptors and i-vector speaker modeling technique. Multimed Tools Appl 78, 6441–6458 (2019). https://doi.org/10.1007/s11042-018-6256-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6256-2