Skip to main content

Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Abstract

In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6 % absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6 % accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3 %.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). also published as a book. Now Publishers, 2009

    Article  MathSciNet  MATH  Google Scholar 

  2. Bozkurt, E., Erzin, E., Erdem, A.T.: Improving automatic emotion recognition from speech signals. In: Proceedings of the INTERSPEECH, pp. 324–327 (2009)

    Google Scholar 

  3. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A database of german emotional speech. In: Proceedings of Interspeech, Lissabon, pp. 1517–1520 (2005)

    Google Scholar 

  4. Chavhan, Y., Dhore, M.L., Yesaware, P.: Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010). published By Foundation of Computer

    Google Scholar 

  5. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech Sig. Proces. 28(4), 357–366 (1980)

    Article  Google Scholar 

  6. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. WileyInterscience, New York (2000)

    MATH  Google Scholar 

  7. Feraru, M., Zbancioc, M.: Speech emotion recognition for srol database using weighted knn algorithm. In: 2013 International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4, June 2013

    Google Scholar 

  8. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014, September 2014

    Google Scholar 

  9. Hosseini, Z., Ahadi, S.: A front-end for emotional speech classification based on new sub-band filters. In: 2015 23rd Iranian Conference on Electrical Engineering (ICEE), pp. 421–425, May 2015

    Google Scholar 

  10. Hosseini, Z., Ahadi, S., Faraji, N.: Speech emotion classification via a modified gaussian mixture model approach. In: 2014 7th International Symposium on Telecommunications (IST), pp. 487–491, September 2014

    Google Scholar 

  11. Krishna Kishore, K., Krishna Satish, P.: Emotion recognition in speech using mfcc and wavelet features. In: 2013 IEEE 3rd International Advance Computing Conference (IACC), pp. 842–847, February 2013

    Google Scholar 

  12. Lee, C.H., Gauvain, J.L.: Speaker adaptation based on map estimation of hmm parameters. In: Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing: Speech Processing, ICASSP 1993, vol. II, pp. 558–561. IEEE Computer Society, Washington, DC(1993)

    Google Scholar 

  13. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cabridge (2012)

    MATH  Google Scholar 

  14. Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. In: 6th International Conference on Neural Information Processing, Proceedings. ICONIP 1999, vol. 2, pp. 495–501 (1999)

    Google Scholar 

  15. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)

    Article  Google Scholar 

  16. Pedregosa, F., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  17. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. In: Digital Signal Processing, p. 2000 (2000)

    Google Scholar 

  18. Schuller, B., Rigoll, G., Lang, M.: Hidden markov model-based speech emotion recognition. In: Proceedings of the 2003 International Conference on Multimedia and Expo, ICME 2003, vol. 2. pp. 401–404. IEEE Computer Society, Washington, DC (2003)

    Google Scholar 

  19. Schwenker, F., Scherer, S., Magdi, Y.M., Palm, G.: The GMM-SVM supervector approach for the recognition of the emotional status from speech. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 894–903. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621–625, August 2011

    Google Scholar 

  21. Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 2004), vol. 1, pp. I-593-6, May 2004

    Google Scholar 

  22. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Communication 53(5), 768–785 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mortaza Doulaty .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Loweimi, E., Doulaty, M., Barker, J., Hain, T. (2015). Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25789-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics