Skip to main content

A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1351))

Abstract

In this paper, we propose a speech emotion classification framework by designing a Deep Learning model strategy and exploring the potential of statistical based modeling approach. The main contributions of our work are as follow: Firstly, we build up a parametrization vector by computing the mean value of Mel Frequency Cepstral Coefficients; secondly, by designing a simple convolutional neural network, we could efficiently decrease the training time and ensure satisfactory classification accuracy within the framework. Performance of the model on the RAVDESS dataset is discussed and evaluated. Results are promising in terms of reliability and accuracy. An overall average accuracy of 79.59% was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. López-de Ipiña, K., Alonso, J.B., Solé-Casals, J., Barroso, N., Henriquez, P., Faundez-Zanuy, M., Travieso, C.M., Ecay-Torres, M., Martínez-Lage, P., Eguiraun, H.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)

    Article  Google Scholar 

  2. Petrushin, V.A.: Emotion in speech recognition and application to call centers. In: Proceedings of Artificial Neural Networks In Engineering (ANNIE 99), pp. 7–10 (1999)

    Google Scholar 

  3. Riyad, M., Khalil, M., Adib, A.: Incep-EEGNet: a convnet for motor imagery decoding. In: Moataz, A.E., Mammass, D., Mansouri, A., Nouboud, F., (eds.) Image and Signal Processing - 9th International Conference, ICISP 2020, Marrakesh, Morocco, 4–6 June 2020, Proceedings, volume 12119 of Lecture Notes in Computer Science, pp. 103–111. Springer, Cham (2020)

    Google Scholar 

  4. Bouny, L.E., Khalil, M., Adib, A.: ECG heartbeat classification based on multi-scale wavelet convolutional neural networks. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, 4–8 May 2020, pp. 3212–3216. IEEE (2020)

    Google Scholar 

  5. Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B.,  Zafeiriou, S.: Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204 (2016)

    Google Scholar 

  6. Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the recola multimodal corpus of remote collaborative and affective interactions. In: EmoSPACE (2013)

    Google Scholar 

  7. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014 (Sept 2014)

    Google Scholar 

  8. Pandey, S.K., Shekhawat, H.S., Prasanna, S.R.M.: Emotion recognition from raw speech using wavenet. In: TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), pp. 1292–1297 (2019)

    Google Scholar 

  9. Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service (PlatCon), pp. 1–5 (2017)

    Google Scholar 

  10. Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)

    Article  Google Scholar 

  11. Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 801–804, New York, NY, USA (2014). Association for Computing Machinery

    Google Scholar 

  12. Chen, M., He, X., Yang, J., Zhang, H.: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process. Lett. 25(10), 1440–1444 (2018)

    Article  Google Scholar 

  13. Hifny, Y., Ali, A.: Efficient Arabic emotion recognition using deep neural networks. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6710–6714 (2019)

    Google Scholar 

  14. Meftah, A., Alotaibi, Y.A., Selouani, S.A.: Designing, building, and analyzing an Arabic speech emotional corpus: phase 2. In: 5th International Conference on Arabic Language Processing, pp. 181–184 (2014)

    Google Scholar 

  15. Sugan, N., Sai Srinivas, N.S., Kar, N., Kumar, L.S., Nath, M.K., Kanhe, A.: Performance comparison of different cepstral features for speech emotion recognition. In: 2018 International CET Conference on Control, Communication, and Computing (IC4), pp. 266–271 (2018)

    Google Scholar 

  16. Issa, D., Fatih Demirci, M., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)

    Article  Google Scholar 

  17. Kim, J., Saurous, R.A.: Emotion recognition from human speech using temporal information and deep learning. Proc. Interspeech 2018, 937–940 (2018)

    Article  Google Scholar 

  18. Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S.: On the robustness of speech emotion recognition for human-robot interaction with deep neural networks. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 854–860 (2018)

    Google Scholar 

  19. Yenigalla, P., Kumar, A., Tripathi, S., Singh, C., Kar, S., Vepa, J.: Speech emotion recognition using spectrogram & phoneme embedding. In: INTERSPEECH (2018)

    Google Scholar 

  20. Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)

    Article  Google Scholar 

  21. Sekkate, S., Khalil, M., Adib, A., Ben Jebara, S.: An investigation of a feature-level fusion for noisy speech emotion recognition. Computers 8(4), 91 (2019)

    Article  Google Scholar 

  22. Bora, M.B., Daimary, D., Amitab, K., Kandar, D.: Handwritten character recognition from images using CNN-ECOC. Procedia Comput. Sci. 167, 2403–2409 (2020). International Conference on Computational Intelligence and Data Science

    Article  Google Scholar 

  23. McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: librosa : Audio and music signal analysis in python (2015)

    Google Scholar 

  24. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  25. Wu, H., Gu, X.: Max-pooling dropout for regularization of convolutional neural networks. In: Neural Information Processing, pp. 46–54 (2015)

    Google Scholar 

  26. Livingstone, S.R., Russo, F.A.: The Rryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13, e0196391 (2018)

    Article  Google Scholar 

  27. Mustaqeem, Kwon, S.: A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2020)

    Google Scholar 

  28. Zeng, Y., Mao, H., Peng, D., Yi, Z.: Spectrogram based multi-task audio classification. Multimedia Tools Appl. 78(3), 3705–3722 (2019)

    Article  Google Scholar 

  29. Sefara, T.J.; The effects of normalisation methods on speech emotion recognition. In: 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC), pp. 1–8 (2019)

    Google Scholar 

  30. Christy, A., Vaithyasubramanian, S., Jesudoss, A., et al.: Multimodal speech emotion recognition and classification using convolutional neural network techniques. In: Int. J. Speech Technol. 23, 381–388 (2020)

    Google Scholar 

  31. Mansouri-Benssassi, E., Ye, J.: Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)

    Google Scholar 

Download references

Acknowledgements

The research presented in this paper is funded through the AL-Khawarizmi project: Artificial Intelligence and its Applications.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sekkate, S., Khalil, M., Adib, A. (2021). A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_114

Download citation

Publish with us

Policies and ethics