Skip to main content

Speech Emotion Recognition Using Local and Global Features

  • Conference paper
  • First Online:
Brain Informatics (BI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10654))

Included in the following conference series:

Abstract

Speech is an easy and useful way to detect speakers’ mental and psychological health, and automatic emotion recognition in speech has been investigated widely in the fields of human-machine interaction, psychology, psychiatry, etc. In this paper, we extract prosodic and spectral features including pitch, MFCC, intensity, ZCR and LSP to establish the emotion recognition model with SVM classifier. In particular, we find different frame duration and overlap have different influences on final results. So, Depth-First-Search method is applied to find the best parameters. Experimental results on two known databases, EMODB and RAVDESS, show that this model works well, and our speech features are enough effectively in characterizing and recognizing emotions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Minker, W., Pittermann, J., Pittermann, A., Strauß, P.M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)

    Article  Google Scholar 

  2. Ntalampiras, S., Potamitis, I., Fakotakis, N.: An adaptive framework for acoustic monitoring of potential hazards. EURASIP J. Audio Speech Music Process. 2009, 13 (2009)

    Article  MATH  Google Scholar 

  3. Cummings, K.E., Clements, M.A., Hansen, J.H.: Estimation and comparison of the glottal source waveform across stress styles using glottal inverse filtering. In: Proceedings of the IEEE Energy and Information Technologies in the Southeast. Southeastcon 1989, pp. 776–781. IEEE (1989)

    Google Scholar 

  4. Seppänen, T., Väyrynen, E., Toivanen, J.: Prosody-based classification of emotions in spoken finnish. In: INTERSPEECH (2003)

    Google Scholar 

  5. Origlia, A., Galatà, V., Ludusan, B.: Automatic classification of emotions via global and local prosodic features on a multilingual emotional database. In: Proceeding of the 2010 Speech Prosody. Chicago (2010)

    Google Scholar 

  6. Atal, B.S.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6), 1304–1312 (1974)

    Article  Google Scholar 

  7. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  8. Ververidis, D., Kotropoulos, C., Pitas, I.: Automatic emotional speech classification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2004), vol. 1, IEEE I-593 (2004)

    Google Scholar 

  9. Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Interspeech, pp. 473–476 (2005)

    Google Scholar 

  10. Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)

    Article  Google Scholar 

  11. Rabiner, L.R., Schafer, R.W.: Digital processing of speech signals (prentice-hall series in signal processing) (1978)

    Google Scholar 

  12. Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)

    Article  Google Scholar 

  13. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)

    Article  Google Scholar 

  14. Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: Stress and emotion classification using jitter and shimmer features. In: IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2007, vol. 4, IEEE IV-1081 (2007)

    Google Scholar 

  15. Lugger, M., Janoir, M.E., Yang, B.: Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In: 2009 17th European Signal Processing Conference, pp. 1225–1229. IEEE (2009)

    Google Scholar 

  16. Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)

    Google Scholar 

  17. Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: features and classification models. Digit. Signal Process. 22(6), 1154–1160 (2012)

    Article  MathSciNet  Google Scholar 

  18. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse autoencoder-based feature transfer learning for speech emotion recognition 7971, 511–516 (2013)

    Google Scholar 

  19. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech, pp. 223–227 (2014)

    Google Scholar 

  20. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of german emotional speech. Interspeech 5, 1517–1520 (2005)

    Google Scholar 

  21. Livingstone, S., Peck, K., Russo, F.: Ravdess: the ryerson audio-visual database of emotional speech and song. In: 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS) (2012)

    Google Scholar 

  22. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462. ACM (2010)

    Google Scholar 

  23. Kotti, M., Paternò, F.: Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema. Int. J. Speech Technol. 15(2), 131–150 (2012)

    Article  Google Scholar 

  24. Lampropoulos, A.S., Tsihrintzis, G.A.: Evaluation of MPEG-7 Descriptors for Speech Emotional Recognition (2012)

    Google Scholar 

  25. Wang, K., An, N., Li, B.N., Zhang, Y., Li, L.: Speech emotion recognition using fourier parameters. IEEE Trans. Affect. Comput. 6(1), 69–75 (2015)

    Article  Google Scholar 

  26. Zhang, B., Essl, G., Provost, E.M.: Recognizing emotion from singing and speaking using shared models. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 139–145. IEEE (2015)

    Google Scholar 

Download references

Acknowledgments

The research was supported in part by NSFC under Grants 11301504 and U1536104, in part by National Basic Research Program of China (973 Program2014CB744600).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baobin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, Y., Li, B., Wang, N., Zhu, T. (2017). Speech Emotion Recognition Using Local and Global Features. In: Zeng, Y., et al. Brain Informatics. BI 2017. Lecture Notes in Computer Science(), vol 10654. Springer, Cham. https://doi.org/10.1007/978-3-319-70772-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70772-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70771-6

  • Online ISBN: 978-3-319-70772-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics