Improving Speech Emotion Recognition System Using Spectral and Prosodic Features

Chakhtouna, Adil; Sekkate, Sara; Adib, Abdellah

doi:10.1007/978-3-030-96308-8_37

Adil Chakhtouna¹⁵,
Sara Sekkate¹⁶ &
Abdellah Adib¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 418))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1773 Accesses
5 Citations

Abstract

The detection of emotions from speech is a key aspect of all human behaviors, Speech Emotion Recognition (SER) plays an extensive role in a diverse range of applications, especially in human-computer communication. The main aim of this study is to build two Machine Learning (ML) models able to classify the input speech into several classes of emotions. In contrast, we extract a set of prosodic and spectral features from sound files and apply a feature selection method to improve the SER rate of the proposed system. Experiments are being done to evaluate the accuracy of the emotional speech system with the use of the RAVDESS database. We performed the efficiency of our models and compared them to the existing literature for SER. Our obtained results indicate that the proposed system based on Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) achieves a test accuracy of \(69.67\%\) and \(65.04\%\) respectively with 8 emotional states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahreini, K., Nadolski, R., Westera, W.: Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21(5), 1367–1386 (2015). https://doi.org/10.1007/s10639-015-9388-2
Article Google Scholar
Abdel-Hamid, L., Shaker, N.H., Emara, I.: Analysis of linguistic and prosodic features of bilingual Arabic-English speakers for speech emotion recognition. IEEE Access 8, 72957–72970 (2020)
Article Google Scholar
BojaniÃ, M., DeliÃ, V., Karpov, A.: Call redistribution for a call center based on speech emotion recognition. Appl. Sci. 10(13), 4653 (2020)
Article Google Scholar
Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8 (2016)
Google Scholar
Getahun, F., Kebede, M.: Emotion identification from spontaneous communication. In: 2016 12th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), pp. 151–158 (2016)
Google Scholar
Sun, L., Fu, S., Wang, F.: Decision tree SVM model with fisher feature selection for speech emotion recognition. EURASIP J. Audio Speech Music. Process. 2019, 2 (2019)
Article Google Scholar
Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019)
Google Scholar
Podder, P., Khan, T.Z., Khan, M.H., Rahman, M.M.: Comparative performance analysis of hamming, hanning and blackman window. Int. J. Comput. Appl. 96(18) (2014)
Google Scholar
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
Article Google Scholar
McKay, C., Fujinaga, I., Depalle, P.: jAudio: a feature extraction library. In: Proceedings of the International Conference on Music Information Retrieval, pp. 600-3 (2005)
Google Scholar
Park, C.-H., Sim, K.-B.: Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks, 2003, vol. 4, pp. 2594–2598. IEEE (2003)
Google Scholar
Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1(6), 1–4 (2013)
Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
McAdams, S.: Perspectives on the contribution of timbre to musical structure. Comput. Music. J. 23(3), 85–102 (1999)
Article Google Scholar
Aparna, U., Paul, S.: Feature selection and extraction in data mining. In: 2016 Online International Conference on Green Engineering and Technologies (IC-GET), pp. 1–3. IEEE (2016)
Google Scholar
Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Machine Intelligence and Pattern Recognition, vol. 16, pp. 403–413. Elsevier (1994)
Google Scholar
Bandela, S.R., Kishore, K.T.: Speech emotion recognition using semi-NMF feature optimization. Turk. J. Electr. Eng. Comput. Sci. 27(5), 3741–3757 (2019)
Article Google Scholar
Liu, Z.-T., Rehman, A., Wu, M., Cao, W.-H., Hao, M.: Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)
Article Google Scholar
Deusi, J.S., Popa, E.I.: An investigation of the accuracy of real time speech emotion recognition. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 336–349. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_26
Chapter Google Scholar
Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA) and the CNRST of Morocco (Alkhawarizmi/2020/01).

Author information

Authors and Affiliations

Team Computer Science, Artificial Intelligence and Big Data, MCSA Laboratory, Faculty of Sciences and Technology of Mohammedia, Hassan II University of Casablanca, Casablanca, Morocco
Adil Chakhtouna & Abdellah Adib
Higher National School of Arts and Crafts of Casablanca, Casablanca, Morocco
Sara Sekkate

Authors

Adil Chakhtouna
View author publications
You can also search for this author in PubMed Google Scholar
Sara Sekkate
View author publications
You can also search for this author in PubMed Google Scholar
Abdellah Adib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adil Chakhtouna .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
Institut für Wirtschaftsinformatik, Fachhochschule Nordwestschweiz, Olten, Switzerland
Thomas Hanne
Department of Computer Science and information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Federal University of Bahia, Ondina, Brazil
Tatiane Nogueira Rios
Nantong University, Nantong Shi, Jiangsu, China
Weiping Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakhtouna, A., Sekkate, S., Adib, A. (2022). Improving Speech Emotion Recognition System Using Spectral and Prosodic Features. In: Abraham, A., Gandhi, N., Hanne, T., Hong, TP., Nogueira Rios, T., Ding, W. (eds) Intelligent Systems Design and Applications. ISDA 2021. Lecture Notes in Networks and Systems, vol 418. Springer, Cham. https://doi.org/10.1007/978-3-030-96308-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-96308-8_37
Published: 27 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96307-1
Online ISBN: 978-3-030-96308-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics