An Improved Convolutional Neural Network for Speech Emotion Recognition

Butt, Sibtain Ahmed; Iqbal, Umer; Ghazali, Rozaida; Shoukat, Ijaz Ali; Lasisi, Ayodele; Al-Saedi, Ahmed Khalaf Zager

doi:10.1007/978-3-031-00828-3_19

Sibtain Ahmed Butt¹⁴,
Umer Iqbal¹⁴,
Rozaida Ghazali¹⁵,
Ijaz Ali Shoukat¹⁴,
Ayodele Lasisi¹⁶ &
…
Ahmed Khalaf Zager Al-Saedi¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 457))

Included in the following conference series:

International Conference on Soft Computing and Data Mining

289 Accesses
3 Citations

Abstract

The speech emotion recognition is a challenging and an exigent task in the field of data science. Existing studies have only focused on one-dimensional Convolutional Neural Network (CNN) architecture for speech emotion recognition. This one-dimensional architecture’s speech recognition accuracy is low when dealt with RAVDESS, TESS and URDU datasets using non-optimal parameters. To overcome this problem, this research work proposed an efficient two-dimensional CNN architecture with an optimized combination of parameters to achieve better accuracy. The proposed method is compared with Support Vector Machine (SVM) and one-dimensional CNN using RAVDESS, TESS and URDU datasets based on accuracy. Based on the conducted experiments, it can be seen that, the proposed method has outperformed with an accuracy of 76.08% and 99.68%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albornoz, E.M., Milone, D.H., Rufiner, H.L.: Spoken emotion recognition using hierarchical classifiers. Comput. Speech Lang. 25(3), 556–570 (2011)
Article Google Scholar
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2227–2231. IEEE (March 2017)
Google Scholar
Issa, D., Demirci, M.F., Yazici, A.: Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)
Google Scholar
Iqbal, U., Ghazali, R.: Chebyshev multilayer perceptron neural network with Levenberg Marquardt-back propagation learning for classification tasks. In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol. 549, pp. 162–170. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51281-5_17
Mohmad Hassim, Y.M., Ghazali, R.: Using artificial bee colony to improve functional link neural network training. In Applied Mechanics and Materials, vol. 263, pp. 2102–2108. Trans Tech Publications Ltd. (2013)
Google Scholar
Cheng, H., Tang, X.: Speech emotion recognition based on interactive convolutional neural network. In 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), pp. 163–167. IEEE (September 2020)
Google Scholar
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
Article Google Scholar
Zayene, B., Jlassi, C., Arous, N.: 3D convolutional recurrent global neural network for speech emotion recognition. In 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp. 1–5. IEEE (September 2020)
Google Scholar
Livingstone, S.R., Russo, F.A.: The Ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS one, 13(5), e0196391 (2018)
Google Scholar
Maqsood, A., Iqbal, U., Shoukat, I.A., Latif, Z., Kanwal, A.: Fibonacci polynomial based multilayer perceptron neural network for classification of medical data. In: AIP Conference Proceedings, vol. 2355, no. 1, p. 040005. AIP Publishing LLC (May 2021)
Google Scholar
Iqbal, U., Ghazali, R., Shah, H.: Fibonacci polynomials based functional link neural network for classification tasks. In: Ghazali, R., Deris, M., Nawi, N., Abawajy, J. (eds.) Recent Advances on Soft Computing and Data Mining. SCDM 2018. AISC, vol. 700, pp. 234–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72550-5_23
Iqbal, U., Ghazali, R., Mushtaq, M.F., Kanwal, A.: Functional expansions based multilayer perceptron neural network for classification task. Computación y Sistemas 22(4), 1625–1635 (2018)
Article Google Scholar
Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Google Scholar
Bhavan, A., Chauhan, P., Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019)
Google Scholar
Zeng, Y., Mao, H., Peng, D., Yi, Z.: Spectrogram based multi-task audio classification. Multimed. Tools Appl. 78(3), 3705–3722 (2017). https://doi.org/10.1007/s11042-017-5539-3
Article Google Scholar
Popova, A.S., Rassadin, A.G., Ponomarenko, A.A.: Emotion recognition in sound. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research. NEUROINFORMATICS 2017. SCI, vol. 736, pp. 117–124. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-66604-4_18
Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8. IEEE (December 2016)
Google Scholar
Dupuis, K., Pichora-Fuller, M.K.: Toronto emotional speech set (TESS)-Younger talker_Happy (2010)
Google Scholar
Sundarprasad, N.: Speech emotion detection using machine learning techniques (2018)
Google Scholar
Venkataramanan, K., Rajamohan, H.R.: Emotion recognition from speech (2019). arXiv preprint arXiv:1912.10458
Krishnan, P.T., Raj, A.N.J., Rajangam, V.: Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell. Syst. 1–16 (2021)
Google Scholar
Latif, S., Qayyum, A., Usman, M., Qadir, J.: Cross lingual speech emotion recognition: Urdu vs. western languages. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 88–93. IEEE (December 2018)
Google Scholar
Latif, S., Qadir, J., Bilal, M.: Unsupervised adversarial domain adaptation for cross-lingual speech emotion recognition. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 732–737. IEEE (September 2019)
Google Scholar

Download references

Acknowledgement

This research was supported by the Universiti Tun Hussein Onn Malaysia (UTHM) through the Multidisciplinary Research Grant (MDR) (Vote H494).

Author information

Authors and Affiliations

Riphah College of Computing, Riphah International University Faisalabad Campus, Faisalabad, Pakistan
Sibtain Ahmed Butt, Umer Iqbal & Ijaz Ali Shoukat
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Rozaida Ghazali
Department of Mathematical Sciences, Faculty of Science, Augustine University, Ilara-Epe, Lagos, 106101, Nigeria
Ayodele Lasisi
Physics Department, College of Science, University of Misan, Amarah, Iraq
Ahmed Khalaf Zager Al-Saedi

Authors

Sibtain Ahmed Butt
View author publications
You can also search for this author in PubMed Google Scholar
Umer Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Rozaida Ghazali
View author publications
You can also search for this author in PubMed Google Scholar
Ijaz Ali Shoukat
View author publications
You can also search for this author in PubMed Google Scholar
Ayodele Lasisi
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Khalaf Zager Al-Saedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Umer Iqbal .

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Rozaida Ghazali
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Nazri Mohd Nawi
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Mustafa Mat Deris
School of Information Technology Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, VIC, Australia
Jemal H. Abawajy
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Nureize Arbaiy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Butt, S.A., Iqbal, U., Ghazali, R., Shoukat, I.A., Lasisi, A., Al-Saedi, A.K.Z. (2022). An Improved Convolutional Neural Network for Speech Emotion Recognition. In: Ghazali, R., Mohd Nawi, N., Deris, M.M., Abawajy, J.H., Arbaiy, N. (eds) Recent Advances in Soft Computing and Data Mining. SCDM 2022. Lecture Notes in Networks and Systems, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-00828-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-00828-3_19
Published: 04 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00827-6
Online ISBN: 978-3-031-00828-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics