Abstract
In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6 % absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6 % accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3 %.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). also published as a book. Now Publishers, 2009
Bozkurt, E., Erzin, E., Erdem, A.T.: Improving automatic emotion recognition from speech signals. In: Proceedings of the INTERSPEECH, pp. 324–327 (2009)
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proceedings of Interspeech, Lissabon, pp. 1517–1520 (2005)
Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010). published By Foundation of Computer
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech Sig. Proces. 28(4), 357–366 (1980)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. WileyInterscience, New York (2000)
Feraru, M., Zbancioc, M.: Speech emotion recognition for srol database using weighted knn algorithm. In: 2013 International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4, June 2013
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014, September 2014
Hosseini, Z., Ahadi, S.: A front-end for emotional speech classification based on new sub-band filters. In: 2015 23rd Iranian Conference on Electrical Engineering (ICEE), pp. 421–425, May 2015
Hosseini, Z., Ahadi, S., Faraji, N.: Speech emotion classification via a modified gaussian mixture model approach. In: 2014 7th International Symposium on Telecommunications (IST), pp. 487–491, September 2014
Krishna Kishore, K., Krishna Satish, P.: Emotion recognition in speech using mfcc and wavelet features. In: 2013 IEEE 3rd International Advance Computing Conference (IACC), pp. 842–847, February 2013
Lee, C.H., Gauvain, J.L.: Speaker adaptation based on map estimation of hmm parameters. In: Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing: Speech Processing, ICASSP 1993, vol. II, pp. 558–561. IEEE Computer Society, Washington, DC(1993)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cabridge (2012)
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. In: 6th International Conference on Neural Information Processing, Proceedings. ICONIP 1999, vol. 2, pp. 495–501 (1999)
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing, p. 2000 (2000)
Schuller, B., Rigoll, G., Lang, M.: Hidden markov model-based speech emotion recognition. In: Proceedings of the 2003 International Conference on Multimedia and Expo, ICME 2003, vol. 2. pp. 401–404. IEEE Computer Society, Washington, DC (2003)
Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)
Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621–625, August 2011
Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 2004), vol. 1, pp. I-593-6, May 2004
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Loweimi, E., Doulaty, M., Barker, J., Hain, T. (2015). Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)