A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification

Shah, Asfahan; Bhowmik, Tanmay

doi:10.1007/978-3-030-94876-4_12

Asfahan Shah¹² &
Tanmay Bhowmik¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13145))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

791 Accesses
1 Citations

Abstract

Speech emotion recognition and classification is one of the most important and emerging fields in artificial intelligence. It has various uses in different applications starting from medical science to smart home devices. Input feature selection is a very important part of speech processing. Mel Frequency Cepstral Coefficients is the most widely used features in the processing of audio data. In case of processing of emotion related data, the fundamental frequency also plays an important role. In this study a comparative analysis has been conducted to determine the better feature in the field of emotion classification. Emo-Db database was used for the study. For classification task the Support Vector Machine classifier with the radial basis and sigmoid function kernel has been used. The model was trained with both the audio features and the performances were compared. Better performance was observed with Mel Frequency Cepstral Coefficients which ensures the better performing speech features in emotion classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chaspari, T., Dimitriadis, D., Maragos, P.: Emotion classification of speech using modulation features. In: 2014 22nd European Signal Processing Conference (EUSIPCO), pp. 1552–1556 (2014)
Google Scholar
de Cheveigné, A., Kawahara, H.: Yin, a fundamental frequency estimator for speech and music. J. Acous. Soc Am. 111(4), 1917–1930 (2002). https://doi.org/10.1121/1.1458024
Article Google Scholar
Dahake, P.P., Shaw, K., Malathi, P.: Speaker dependent speech emotion recognition using MFCC and support vector machine. In: 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), pp. 1080–1084 (2016). https://doi.org/10.1109/ICACDOT.2016.7877753
Kathiresan, T., Dellwo, V.: Cepstral derivatives in MFCCs for emotion recognition. In: 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), pp. 56–60 (2019). https://doi.org/10.1109/SIPROCESS.2019.8868573
Kuchibhotla, S., Vankayalapati, H.D., Vaddi, R.S., Anne, K.R.: A comparative analysis of classifiers in emotion recognition through acoustic features. Int. J. Speech Technol. 17(4), 401–408 (2014). https://doi.org/10.1007/s10772-014-9239-3
Article Google Scholar
McRoberts, G.W., Studdert-Kennedy, M., Shankweiler, D.P.: The role of fundamental frequency in signaling linguistic stress and affect: evidence for a dissociation. Percept. Psychophys. 57(2), 159–174 (1995)
Google Scholar
Mohanta, A., Mittal, V.K.: Classifying emotional states using pitch and formants in vowel regions. In: 2016 International Conference on Signal Processing and Communication (ICSC), pp. 458–463 (2016). https://doi.org/10.1109/ICSPCom.2016.7980624
Seehapoch, T., Wongthanavasu, S.: Speech emotion recognition using support vector machines. In: 2013 5th International Conference on Knowledge and Smart Technology (KST), pp. 86–91 (2013). https://doi.org/10.1109/KST.2013.6512793
Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., Díaz-de-María, F.: SVMs for automatic speech recognition: a survey. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds.) Progress in Nonlinear Speech Processing. LNCS, vol. 4391, pp. 190–216. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71505-4_11
Chapter Google Scholar
Williams, C.E., Stevens, K.N.: Emotions and speech: some acoustical correlates. J. Acous. Soc. Am. 52(4B), 1238–1250 (1972)
Google Scholar

Download references

Author information

Authors and Affiliations

Bennett University, Greater Noida, Uttar Pradesh, India
Asfahan Shah & Tanmay Bhowmik

Authors

Asfahan Shah
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Bhowmik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

International Institute of Information Technology, Hyderabad, India
Raju Bapi
Michigan State University, East Lansing, MI, USA
Sandeep Kulkarni
Ericsson India Global Services Private Ltd., Bangalore, India
Swarup Mohalik
Indian Institute of Technology Hyderabad, Kandi, Telangana, India
Sathya Peri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, A., Bhowmik, T. (2022). A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification. In: Bapi, R., Kulkarni, S., Mohalik, S., Peri, S. (eds) Distributed Computing and Intelligent Technology. ICDCIT 2022. Lecture Notes in Computer Science(), vol 13145. Springer, Cham. https://doi.org/10.1007/978-3-030-94876-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-94876-4_12
Published: 17 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94875-7
Online ISBN: 978-3-030-94876-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification