A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition

Sekkate, Sara; Khalil, Mohammed; Adib, Abdellah

doi:10.1007/978-3-030-71187-0_114

Sara Sekkate²⁰,
Mohammed Khalil²⁰ &
Abdellah Adib²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1351))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

2098 Accesses
1 Citations

Abstract

In this paper, we propose a speech emotion classification framework by designing a Deep Learning model strategy and exploring the potential of statistical based modeling approach. The main contributions of our work are as follow: Firstly, we build up a parametrization vector by computing the mean value of Mel Frequency Cepstral Coefficients; secondly, by designing a simple convolutional neural network, we could efficiently decrease the training time and ensure satisfactory classification accuracy within the framework. Performance of the model on the RAVDESS dataset is discussed and evaluated. Results are promising in terms of reliability and accuracy. An overall average accuracy of 79.59% was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

López-de Ipiña, K., Alonso, J.B., Solé-Casals, J., Barroso, N., Henriquez, P., Faundez-Zanuy, M., Travieso, C.M., Ecay-Torres, M., Martínez-Lage, P., Eguiraun, H.: On automatic diagnosis of Alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cogn. Comput. 7(1), 44–55 (2015)
Article Google Scholar
Petrushin, V.A.: Emotion in speech recognition and application to call centers. In: Proceedings of Artificial Neural Networks In Engineering (ANNIE 99), pp. 7–10 (1999)
Google Scholar
Riyad, M., Khalil, M., Adib, A.: Incep-EEGNet: a convnet for motor imagery decoding. In: Moataz, A.E., Mammass, D., Mansouri, A., Nouboud, F., (eds.) Image and Signal Processing - 9th International Conference, ICISP 2020, Marrakesh, Morocco, 4–6 June 2020, Proceedings, volume 12119 of Lecture Notes in Computer Science, pp. 103–111. Springer, Cham (2020)
Google Scholar
Bouny, L.E., Khalil, M., Adib, A.: ECG heartbeat classification based on multi-scale wavelet convolutional neural networks. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, 4–8 May 2020, pp. 3212–3216. IEEE (2020)
Google Scholar
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S.: Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204 (2016)
Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the recola multimodal corpus of remote collaborative and affective interactions. In: EmoSPACE (2013)
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014 (Sept 2014)
Google Scholar
Pandey, S.K., Shekhawat, H.S., Prasanna, S.R.M.: Emotion recognition from raw speech using wavenet. In: TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), pp. 1292–1297 (2019)
Google Scholar
Badshah, A.M., Ahmad, J., Rahim, N., Baik, S.W.: Speech emotion recognition from spectrograms with deep convolutional neural network. In: 2017 International Conference on Platform Technology and Service (PlatCon), pp. 1–5 (2017)
Google Scholar
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimedia 16(8), 2203–2213 (2014)
Article Google Scholar
Huang, Z., Dong, M., Mao, Q., Zhan, Y.: Speech emotion recognition using CNN. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 801–804, New York, NY, USA (2014). Association for Computing Machinery
Google Scholar
Chen, M., He, X., Yang, J., Zhang, H.: 3-D convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Process. Lett. 25(10), 1440–1444 (2018)
Article Google Scholar
Hifny, Y., Ali, A.: Efficient Arabic emotion recognition using deep neural networks. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6710–6714 (2019)
Google Scholar
Meftah, A., Alotaibi, Y.A., Selouani, S.A.: Designing, building, and analyzing an Arabic speech emotional corpus: phase 2. In: 5th International Conference on Arabic Language Processing, pp. 181–184 (2014)
Google Scholar
Sugan, N., Sai Srinivas, N.S., Kar, N., Kumar, L.S., Nath, M.K., Kanhe, A.: Performance comparison of different cepstral features for speech emotion recognition. In: 2018 International CET Conference on Control, Communication, and Computing (IC4), pp. 266–271 (2018)
Google Scholar
Issa, D., Fatih Demirci, M., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)
Article Google Scholar
Kim, J., Saurous, R.A.: Emotion recognition from human speech using temporal information and deep learning. Proc. Interspeech 2018, 937–940 (2018)
Article Google Scholar
Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S.: On the robustness of speech emotion recognition for human-robot interaction with deep neural networks. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 854–860 (2018)
Google Scholar
Yenigalla, P., Kumar, A., Tripathi, S., Singh, C., Kar, S., Vepa, J.: Speech emotion recognition using spectrogram & phoneme embedding. In: INTERSPEECH (2018)
Google Scholar
Zhao, J., Mao, X., Chen, L.: Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)
Article Google Scholar
Sekkate, S., Khalil, M., Adib, A., Ben Jebara, S.: An investigation of a feature-level fusion for noisy speech emotion recognition. Computers 8(4), 91 (2019)
Article Google Scholar
Bora, M.B., Daimary, D., Amitab, K., Kandar, D.: Handwritten character recognition from images using CNN-ECOC. Procedia Comput. Sci. 167, 2403–2409 (2020). International Conference on Computational Intelligence and Data Science
Article Google Scholar
McFee, B., Raffel, C., Liang, D., Ellis, D.P.W., McVicar, M., Battenberg, E., Nieto, O.: librosa : Audio and music signal analysis in python (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Wu, H., Gu, X.: Max-pooling dropout for regularization of convolutional neural networks. In: Neural Information Processing, pp. 46–54 (2015)
Google Scholar
Livingstone, S.R., Russo, F.A.: The Rryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13, e0196391 (2018)
Article Google Scholar
Mustaqeem, Kwon, S.: A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2020)
Google Scholar
Zeng, Y., Mao, H., Peng, D., Yi, Z.: Spectrogram based multi-task audio classification. Multimedia Tools Appl. 78(3), 3705–3722 (2019)
Article Google Scholar
Sefara, T.J.; The effects of normalisation methods on speech emotion recognition. In: 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC), pp. 1–8 (2019)
Google Scholar
Christy, A., Vaithyasubramanian, S., Jesudoss, A., et al.: Multimodal speech emotion recognition and classification using convolutional neural network techniques. In: Int. J. Speech Technol. 23, 381–388 (2020)
Google Scholar
Mansouri-Benssassi, E., Ye, J.: Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Google Scholar

Download references

Acknowledgements

The research presented in this paper is funded through the AL-Khawarizmi project: Artificial Intelligence and its Applications.

Author information

Authors and Affiliations

Team Networks, Telecoms and Multimedia LIM@II-FSTM, Hassan II University of Casablanca, B.P. 146S, 20650, Mohammedia, Morocco
Sara Sekkate, Mohammed Khalil & Abdellah Adib

Authors

Sara Sekkate
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Abdellah Adib
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Department of Computer Science, Università degli Studi di Milano, Milan, Milano, Italy
Vincenzo Piuri
Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France
Patrick Siarry
Department of Construction Management and Real Estate, Vilnius Gediminas Technical University, Vilnius, Lithuania
Arturas Kaklauskas
School of Engineering, Instituto Superior de Engenharia do Porto, Porto, Portugal
Ana Madureira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sekkate, S., Khalil, M., Adib, A. (2021). A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition. In: Abraham, A., Piuri, V., Gandhi, N., Siarry, P., Kaklauskas, A., Madureira, A. (eds) Intelligent Systems Design and Applications. ISDA 2020. Advances in Intelligent Systems and Computing, vol 1351. Springer, Cham. https://doi.org/10.1007/978-3-030-71187-0_114

Download citation

DOI: https://doi.org/10.1007/978-3-030-71187-0_114
Published: 03 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71186-3
Online ISBN: 978-3-030-71187-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics