Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Hora, Baveet Singh; Uthiraa, S.; Patil, Hemant A.

doi:10.1007/978-3-031-48309-7_10

Baveet Singh Hora¹³,
S. Uthiraa¹³ &
Hemant A. Patil¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

International Conference on Speech and Computer

887 Accesses
1 Citations

Abstract

As technology advances, our reliance on machines grows, necessitating the development of effective approaches for Speech Emotion Recognition (SER) to enhance human-machine interaction. This paper introduces a novel feature extraction technique called Linear Frequency Residual Cepstral Coefficients (LFRCC) for the SER task. To the best of our knowledge and belief, this is the first attempt to employ LFRCC for SER. Experimental evaluations were conducted on the widely used EmoDB dataset, focusing on four emotions: anger, happiness, neutrality, and sadness. Results demonstrated that the proposed LFRCC features outperform state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC) relatively by a significant margin: 25.64 % and 10.26 %, respectively, when using a residual neural network (ResNet); and 12.37 % and 4.67 %, respectively when combined with the Time-Delay Neural Network (TDNN) as classifier. Furthermore, the proposed LFRCC features exhibit a better Equal Error Rate (EER) than the other two baseline methods. Additionally, classifier-level and score-level fusion techniques were employed, and the combination of MFCC and LFRCC at the score-level achieved the highest accuracy of 94.87 % and the lowest EER of 3.625%. The better performance of the proposed feature set may be due to its capability to capture excitation source information via linearly-spaced subbands in the cepstral domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech emotion recognition for human–computer interaction

Article 31 August 2024

Speech emotion recognition using MFCC-based entropy feature

Article 22 August 2023

Machine Learning Techniques for Speech Emotion Classification

References

Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
Article Google Scholar
Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43, 155–177 (2015)
Article Google Scholar
Bhangale, K., Kothandaraman, M.: Speech emotion recognition based on multiple acoustic features and deep convolutional neural network. Electronics 12(4), 839 (2023)
Article Google Scholar
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech, Lisbon, Portugal, vol. 5, pp. 1517–1520 (2005)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778 (2016)
Google Scholar
Homma, I., Masaoka, Y.: Breathing rhythms and emotions. Exp. Physiol. 93(9), 1011–1021 (2008)
Article Google Scholar
Jerath, R., Beveridge, C.: Respiratory rhythm, autonomic modulation, and the spectrum of emotions: the future of emotion recognition and modulation. Front. Psychol. 11, 1980 (2020)
Article Google Scholar
Koolagudi, S.G., Reddy, R., Rao, K.S.: Emotion recognition from speech signal using epoch parameters. In: International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IISc Bangalore, India (2010)
Google Scholar
Krothapalli, S.R., Koolagudi, S.G.: Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16, 181–201 (2013)
Article Google Scholar
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
Okabe, K., Koshinaka, T., Shinoda, K.: Attentive statistics pooling for deep speaker embedding. In: Interspeech, Hyderabad, India, pp. 2252–2256 (2018)
Google Scholar
Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Interspeech, Dresden, Germany, pp. 3214–3218 (2015)
Google Scholar
Sadok, S., Leglaive, S., Séguier, R.: A vector quantized masked autoencoder for speech emotion recognition. In: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pp. 1–5 (2023)
Google Scholar
Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40(1–2), 227–256 (2003)
Article MATH Google Scholar
Swain, M., Routray, A., Kabisatpathy, P.: Databases, features and classifiers for speech emotion recognition: a review. Int. J. Speech Technol. 21(1), 93–120 (2018). https://doi.org/10.1007/s10772-018-9491-z
Article Google Scholar
Tak, H., Patil, H.A.: Novel linear frequency residual cepstral features for replay attack detection. In: Interspeech, Hyderabad, India, pp. 726–730 (2018)
Google Scholar
Tripathi, S., Kumar, A., Ramesh, A., Singh, C., Yenigalla, P.: Focal loss based residual convolutional neural network for speech emotion recognition. arXiv preprint arXiv:1906.05682 (2019)

Download references

Author information

Authors and Affiliations

Speech Research Lab, DA-IICT, Gandhinagar, Gujarat, India
Baveet Singh Hora, S. Uthiraa & Hemant A. Patil

Authors

Baveet Singh Hora
View author publications
You can also search for this author in PubMed Google Scholar
S. Uthiraa
View author publications
You can also search for this author in PubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baveet Singh Hora .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
Indian Institute of Information Technology Dharwad, Dharwad, India
K. T. Deepak
Indian Institute of Technology Dharwad, Dharwad, India
Rajesh M. Hegde
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal
Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hora, B.S., Uthiraa, S., Patil, H.A. (2023). Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-48309-7_10
Published: 22 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition