Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

Loweimi, Erfan; Doulaty, Mortaza; Barker, Jon; Hain, Thomas

doi:10.1007/978-3-319-25789-1_17

Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

Erfan Loweimi¹⁶,
Mortaza Doulaty¹⁶,
Jon Barker¹⁶ &
…
Thomas Hain¹⁶

Conference paper
First Online: 17 November 2015

722 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Abstract

In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6 % absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6 % accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3 %.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). also published as a book. Now Publishers, 2009
Article MathSciNet MATH Google Scholar
Bozkurt, E., Erzin, E., Erdem, A.T.: Improving automatic emotion recognition from speech signals. In: Proceedings of the INTERSPEECH, pp. 324–327 (2009)
Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proceedings of Interspeech, Lissabon, pp. 1517–1520 (2005)
Google Scholar
Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010). published By Foundation of Computer
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech Sig. Proces. 28(4), 357–366 (1980)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. WileyInterscience, New York (2000)
MATH Google Scholar
Feraru, M., Zbancioc, M.: Speech emotion recognition for srol database using weighted knn algorithm. In: 2013 International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4, June 2013
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014, September 2014
Google Scholar
Hosseini, Z., Ahadi, S.: A front-end for emotional speech classification based on new sub-band filters. In: 2015 23rd Iranian Conference on Electrical Engineering (ICEE), pp. 421–425, May 2015
Google Scholar
Hosseini, Z., Ahadi, S., Faraji, N.: Speech emotion classification via a modified gaussian mixture model approach. In: 2014 7th International Symposium on Telecommunications (IST), pp. 487–491, September 2014
Google Scholar
Krishna Kishore, K., Krishna Satish, P.: Emotion recognition in speech using mfcc and wavelet features. In: 2013 IEEE 3rd International Advance Computing Conference (IACC), pp. 842–847, February 2013
Google Scholar
Lee, C.H., Gauvain, J.L.: Speaker adaptation based on map estimation of hmm parameters. In: Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing: Speech Processing, ICASSP 1993, vol. II, pp. 558–561. IEEE Computer Society, Washington, DC(1993)
Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cabridge (2012)
MATH Google Scholar
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. In: 6th International Conference on Neural Information Processing, Proceedings. ICONIP 1999, vol. 2, pp. 495–501 (1999)
Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing, p. 2000 (2000)
Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden markov model-based speech emotion recognition. In: Proceedings of the 2003 International Conference on Multimedia and Expo, ICME 2003, vol. 2. pp. 401–404. IEEE Computer Society, Washington, DC (2003)
Google Scholar
Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)
Chapter Google Scholar
Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621–625, August 2011
Google Scholar
Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 2004), vol. 1, pp. I-593-6, May 2004
Google Scholar
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech and Hearing Research Group (SPandH), University of Sheffield, Sheffield, UK
Erfan Loweimi, Mortaza Doulaty, Jon Barker & Thomas Hain

Authors

Erfan Loweimi
View author publications
You can also search for this author in PubMed Google Scholar
Mortaza Doulaty
View author publications
You can also search for this author in PubMed Google Scholar
Jon Barker
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mortaza Doulaty .

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
Klára Vicsi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Loweimi, E., Doulaty, M., Barker, J., Hain, T. (2015). Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-25789-1_17
Published: 17 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics