Improved Speech Emotion Classification Using Deep Neural Network

Hama Saeed, Mariwan

doi:10.1007/s00034-023-02446-8

Improved Speech Emotion Classification Using Deep Neural Network

Published: 29 July 2023

Volume 42, pages 7357–7376, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Mariwan Hama Saeed ORCID: orcid.org/0000-0003-4962-4239¹

453 Accesses
Explore all metrics

Abstract

Speech emotion recognition (SER), which has gained greater attention in recent years, is a key aspect of the human–computer interaction process. However, a wide range of strategies has been offered in SER, and these approaches have yet to increase performance. In this study, a deep neural network model for classifying voice emotions is suggested. It is divided into three stages: feature extraction, normalization, and emotion recognition. The Librosa Python Toolkit is used to acquire the MFCC, Mel-Spectrogram Frequency, Chroma, and Poly Features during feature extraction. Data augmentation for the minority class using SMOTE (synthetic minority oversampling technique) and the Min–Max scaler for the normalization process were used. The model was evaluated on three frequently used languages: German, English, and French, using the Berlin Emotional Speech Database (EMODB), Surrey Audio-Visual Expressed Emotion Dataset (SAVEE), and the Canadian French Emotional (CaFE) speech datasets. The recognition rates of unweighted accuracy of 95% on EMODB, 90% on SAVEE, and 92% on CaFE are gained in speaker-dependent experiments. The results show that the suggested method is capable of efficiently recognizing emotions and outperformed the other approaches utilized for comparison in terms of performance indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition Using Deep Learning

Emotion Recognition Using Text and Speech Through Machine Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

Data Availability and Materials

Available upon request.

References

L. Chen et al., Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction. Inf. Sci. 509, 150–163 (2020)
Article Google Scholar
H. Ibrahim, C.K. Loo, Reservoir computing with truncated normal distribution for speech emotion recognition. Malays. J. Comput. Sci. 35, 128–141 (2022)
Article Google Scholar
N. Truong Pham, Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. ArXiv arXiv:2109.09026 (2021).
L.-N. Do et al., Deep neural network-based fusion model for emotion recognition using visual data. J. Supercomput. 77, 10773–10790 (2021)
Article Google Scholar
L. Rowlands, Emotions: how humans regulate them and why some people can’t | News and Events | Bangor University. https://www.bangor.ac.uk/news/archive/emotions-how-humans-regulate-them-and-why-some-people-can-t-38444
B. McFee et al. Librosa: audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference 18–24 (SciPy, 2015). doi:https://doi.org/10.25080/majora-7b98e3ed-003.
M.C. Sezgin, B. Gunsel, G.K. Kurt, Perceptual audio features for emotion detection. EURASIP J. Audio Speech Music Process. 2012, 16 (2012)
Article Google Scholar
C. Albon, Machine learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning (O’Reilly Media, Sebastopol, 2018)
Google Scholar
S. Kanwal, S. Asghar, Speech emotion recognition using clustering based GA-optimized feature set. IEEE Access 9, 125830–125842 (2021)
Article Google Scholar
K. Chauhan, K.K. Sharma, T. Varma, Speech emotion recognition using convolution neural networks. Proc. Int. Conf. Artif. Intell. Smart Syst. 2021, 1176–1181 (2021). https://doi.org/10.1109/ICAIS50930.2021.9395844
Article Google Scholar
H. Ibrahim, C.K. Loo, F. Alnajjar, Speech emotion recognition by late fusion for bidirectional reservoir computing with random projection. IEEE Access 9, 122855–122871 (2021)
Article Google Scholar
N. Liu et al., Transfer subspace learning for unsupervised cross-corpus speech emotion recognition. IEEE Access 9, 95925–95937 (2021)
Article Google Scholar
S.M. Mustaqeem, S. Kwon, Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access 8, 79861–79875 (2020)
Article Google Scholar
N.T. Pham, D.N.M. Dang, N.D. Nguyen, T.T. Nguyen, H. Nguyen, B. Manavalan, C.P. Lim, S.D. Nguyen, Hybrid data augmentation and deep attention-based dilated convolutional-recurrent neural networks for speech emotion recognition. Expert Syst. Appl. (2023). https://doi.org/10.48550/arxiv.2109.09026
Article Google Scholar
M. Seknedy El, S. Fawzi, Speech emotion recognition system for human interaction applications. In: Proceedings—2021 IEEE 10th International Conference on Intelligent Computing and Information Systems, ICICIS 2021 361–368 (2021) doi:https://doi.org/10.1109/ICICIS52592.2021.9694246.
H. Zhang, H. Huang, H. Han, A novel heterogeneous parallel convolution bi-LSTM for speech emotion recognition. Appl. Sci. 11, 9897 (2021)
Article Google Scholar
H. Aouani, Y.B. Ayed, Speech emotion recognition with deep learning. Procedia Comput. Sci. 176, 251–260 (2020)
Article Google Scholar
S. Huang et al., Multi-layer hybrid fuzzy classification based on SVM and improved PSO for speech emotion recognition. Electronics 10, 2891 (2021)
Article Google Scholar
Z.T. Liu, A. Rehman, M. Wu, W.H. Cao, M. Hao, Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)
Article Google Scholar
W. Zehra, A.R. Javed, Z. Jalil, H.U. Khan, T.R. Gadekallu, Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell. Syst. 7, 1845–1854 (2021)
Article Google Scholar
M.D. Pawar, R.D. Kokate, Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients. Multimed. Tools Appl. 80, 15563–15587 (2021)
Article Google Scholar
M.H. Pham, F.M. Noori, J.Torresen, Emotion recognition using speech data with convolutional neural network. In: 2021 IEEE 2nd International Conference on Signal, Control and Communication, SCC 2021 182–187 (2021) doi:https://doi.org/10.1109/SCC53769.2021.9768372.
K.K. Sahoo, I. Dutta, M.F. Ijaz, M. Wozniak, P.K. Singh, TLEFuzzyNet: fuzzy rank-based ensemble of transfer learning models for emotion recognition from human speeches. IEEE Access 9, 166518–166530 (2021)
Article Google Scholar
A.K. Sahoo, C. Pradhan, H. Das, Performance evaluation of different machine learning methods and deep-learning based convolutional neural network for health decision making, in Nature Inspired Computing for Data Science. In Studies in Computational Intelligence. (Springer, Cham, 2020), pp.201–212
Google Scholar
B.T. Atmaja, A. Sasou, M. Akagi, Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Commun. 140, 11–28 (2022)
Article Google Scholar
K.S. Rao, S.G. Koolagudi, Emotion Recognition using Speech Features (Springer, Cham, 2013). https://doi.org/10.1007/978-1-4614-5143-3
Book MATH Google Scholar
M. Swain, A. Routray, P. Kabisatpathy, Databases, features and classifiers for speech emotion recognition: a review. Int. J. Speech Technol. 21, 93–120 (2018)
Article Google Scholar
G. Degottex, J. Kane, T. Drugman, T. Raitio, S. Scherer, COVAREP—A collaborative voice analysis repository for speech technologies. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 960–964 (2014) doi:https://doi.org/10.1109/ICASSP.2014.6853739.
Eyben F., B. Schuller. openSMILE:. ACM SIGMultimedia Records 6, (2015).
Kejriwal J., Benus, S. & Trnka, M. Stress detection using non-semantic speech representation. 2022 32nd International Conference Radioelektronika, RADIOELEKTRONIKA 2022 - Proceedings (2022) doi:https://doi.org/10.1109/RADIOELEKTRONIKA54537.2022.9764916.
S. Mai, S. Xing, H. Hu, Analyzing multimodal sentiment via acoustic- and visual-LSTM with channel-aware temporal convolution network. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 1424–1437 (2021)
Article Google Scholar
M. Mueller, B. McFee, Interactive learning of signal processing through music: making fourier analysis concrete for students. IEEE Signal Process Mag. 38, 73–84 (2021)
Article Google Scholar
M. Muthumari, V. Akash, K. Prudhvicharan, P. Akhil, A novel model for emotion detection with multilayer perceptron neural network. Proceedings—2022 6th International Conference on Intelligent Computing and Control Systems, ICICCS 2022 1126–1131 (2022) doi:https://doi.org/10.1109/ICICCS53718.2022.9788269.
V.S. Nallanthighal, A. Härmä, H. Strik, Detection of COPD exacerbation from speech: comparison of acoustic features and deep learning based speech breathing models. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2022-May, 2505–2509 (2022).
S. Suman, K.S. Sahoo, C. Das, N.Z. Jhanjhi, A. Mitra, Visualization of Audio Files Using Librosa (Springer Nature Singapore, Singapore, 2022), pp.409–418. https://doi.org/10.1007/978-981-19-0182-9_41
Book Google Scholar
M. Tomprou, Y.J. Kim, P. Chikersal, A.W. Woolley, L.A. Dabbish, Speaking out of turn: How video conferencing reduces vocal synchrony and collective intelligence. PLoS ONE 16, e0247655 (2021)
Article Google Scholar
Y.H.H. Tsai et al., Multimodal transformer for unaligned multimodal language sequences. Proc. Conf. Assoc. Comput. Linguist. Meet. 2019, 6558 (2019)
Article Google Scholar
J. Krzywanski et al., Multi-stream convolution-recurrent neural networks based on attention mechanism fusion for speech emotion recognition. Entropy 24, 1025 (2022)
Article Google Scholar
C. Zhang, L. Xue, Autoencoder with emotion embedding for speech emotion recognition. IEEE Access 9, 51231–51241 (2021)
Article Google Scholar
S.M. Mustaqeem, S. Kwon, Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36, 5116–5135 (2021)
Article Google Scholar
N. Senthilkumar, S. Karpakam, M. Gayathri Devi, R. Balakumaresan, P. Dhilipkumar, Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Mater. Today Proc. (2022). https://doi.org/10.1016/j.matpr.2021.12.246
Article Google Scholar
B. Maji, M. Swain, M. Mustaqeem, Advanced fusion-based speech emotion recognition system using a dual-attention mechanism with conv-caps and bi-GRU features. Electronics 11, 1328 (2022)
Article Google Scholar
M. Rayhan Ahmed, S. Islam, A.K.M. Muzahidul Islam, S. Shatabda, An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 218, 119633 (2023)
Article Google Scholar
Y.L. Prasanna, Y. Tarakaram, Y. Mounika, S. Palaniswamy, S. Vekkot, Comparative deep network analysis of speech emotion recognition models using data augmentation. Int. Conf. Disruptive Technol. Multi-Discipl. Res. Appl. 2, 185–190 (2023). https://doi.org/10.1109/CENTCON56610.2022.10051557
Article Google Scholar
P. Jackson, S. Haq, Surrey Audio-Visual Expressed Emotion (Savee) Database (University of Surrey, Guildford, 2014)
Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech. In INTERSPEECH (2005).
Gournay P, O. Lahaie, R. Lefebvre, A Canadian French emotional speech dataset. Proc. 9th ACM Multimed. Syst. Conf. (2018). https://doi.org/10.5281/ZENODO.1478765
S. Goel, H. Beigi, Cross lingual cross corpus speech emotion recognition. arXiv preprint (2020). https://doi.org/10.48550/arxiv.2003.07996
Article Google Scholar
S.R. Krothapalli, Koolagudi, S. G. Emotion Recognition Using Vocal Tract Information. in 67–78 (2013). doi:https://doi.org/10.1007/978-1-4614-5143-3_4.
K.S. Rao, K.E. Manjunath, Speech Recognition Using Articulatory and Excitation Source Features (Springer International Publishing, Cham, 2017). https://doi.org/10.1007/978-3-319-49220-9
Book Google Scholar
S. Guha et al., Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals. IEEE Access 8, 182868–182887 (2020)
Article Google Scholar
M. Müller, D.P.W. Ellis, A. Klapuri, G. Richard, Signal processing for music analysis. IEEE J. Sel. Top. Sign. Proces. 5, 1088–1110 (2011)
Article Google Scholar
A. Lerch, An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics (Wiley, New York, 2012)
Book Google Scholar
J. Brownlee, Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python. (2020).
J. Brownlee, Imbalanced Classification with Python—Choose Better Metrics, Balance Skewed Classes, and Apply Cost-Sensitive Learning. Machine Learning Mastery (2020).
A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, Sebastopol, 2019)
Google Scholar
L. Long, X. Zeng, Beginning deep learning with tensorflow. Begin. Deep Learn. TensorFlow (2022). https://doi.org/10.1007/978-1-4842-7915-1
Article Google Scholar
F. Daneshfar, S.J. Kabudian, Speech Emotion Recognition Using a New Hybrid Quaternion-Based Echo State Network-Bilinear Filter, Proceedings - 2021 7th International Conference on Signal Processing and Intelligent Systems, ICSPIS (2021). https://doi.org/10.1109/ICSPIS54653.2021.9729337
A. Thakur, S.K. Dhull, Language-independent hyperparameter optimization based speech emotion recognition system. Int J Inform Technol 2022, 1–9 (2022). https://doi.org/10.1007/S41870-022-00996-9
Article Google Scholar
J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient. Applied Acoustics 179, 108046 (2021). https://doi.org/10.1016/J.APACOUST.2021.108046
Article Google Scholar

Download references

Funding

No funding source was received.

Author information

Authors and Affiliations

College of Basic Education, University of Halabja, Halabja, Iraq
Mariwan Hama Saeed

Authors

Mariwan Hama Saeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariwan Hama Saeed.

Ethics declarations

Conflict of interest

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hama Saeed, M. Improved Speech Emotion Classification Using Deep Neural Network. Circuits Syst Signal Process 42, 7357–7376 (2023). https://doi.org/10.1007/s00034-023-02446-8

Download citation

Received: 16 November 2022
Revised: 28 June 2023
Accepted: 29 June 2023
Published: 29 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00034-023-02446-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Speech Emotion Classification Using Deep Neural Network

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Deep Learning

Emotion Recognition Using Text and Speech Through Machine Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

Data Availability and Materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved Speech Emotion Classification Using Deep Neural Network

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition Using Deep Learning

Emotion Recognition Using Text and Speech Through Machine Learning

Speech Emotion Recognition Using Mel Frequency Log Spectrogram and Deep Convolutional Neural Network

Data Availability and Materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation