Skip to main content

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14338))

Included in the following conference series:

  • 506 Accesses

Abstract

As technology advances, our reliance on machines grows, necessitating the development of effective approaches for Speech Emotion Recognition (SER) to enhance human-machine interaction. This paper introduces a novel feature extraction technique called Linear Frequency Residual Cepstral Coefficients (LFRCC) for the SER task. To the best of our knowledge and belief, this is the first attempt to employ LFRCC for SER. Experimental evaluations were conducted on the widely used EmoDB dataset, focusing on four emotions: anger, happiness, neutrality, and sadness. Results demonstrated that the proposed LFRCC features outperform state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC) relatively by a significant margin: 25.64 % and 10.26 %, respectively, when using a residual neural network (ResNet); and 12.37 % and 4.67 %, respectively when combined with the Time-Delay Neural Network (TDNN) as classifier. Furthermore, the proposed LFRCC features exhibit a better Equal Error Rate (EER) than the other two baseline methods. Additionally, classifier-level and score-level fusion techniques were employed, and the combination of MFCC and LFRCC at the score-level achieved the highest accuracy of 94.87 % and the lowest EER of 3.625%. The better performance of the proposed feature set may be due to its capability to capture excitation source information via linearly-spaced subbands in the cepstral domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)

    Article  Google Scholar 

  2. Anagnostopoulos, C.N., Iliou, T., Giannoukos, I.: Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artif. Intell. Rev. 43, 155–177 (2015)

    Article  Google Scholar 

  3. Bhangale, K., Kothandaraman, M.: Speech emotion recognition based on multiple acoustic features and deep convolutional neural network. Electronics 12(4), 839 (2023)

    Article  Google Scholar 

  4. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B., et al.: A database of german emotional speech. In: Interspeech, Lisbon, Portugal, vol. 5, pp. 1517–1520 (2005)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778 (2016)

    Google Scholar 

  6. Homma, I., Masaoka, Y.: Breathing rhythms and emotions. Exp. Physiol. 93(9), 1011–1021 (2008)

    Article  Google Scholar 

  7. Jerath, R., Beveridge, C.: Respiratory rhythm, autonomic modulation, and the spectrum of emotions: the future of emotion recognition and modulation. Front. Psychol. 11, 1980 (2020)

    Article  Google Scholar 

  8. Koolagudi, S.G., Reddy, R., Rao, K.S.: Emotion recognition from speech signal using epoch parameters. In: International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IISc Bangalore, India (2010)

    Google Scholar 

  9. Krothapalli, S.R., Koolagudi, S.G.: Characterization and recognition of emotions from speech using excitation source information. Int. J. Speech Technol. 16, 181–201 (2013)

    Article  Google Scholar 

  10. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  11. Okabe, K., Koshinaka, T., Shinoda, K.: Attentive statistics pooling for deep speaker embedding. In: Interspeech, Hyderabad, India, pp. 2252–2256 (2018)

    Google Scholar 

  12. Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Interspeech, Dresden, Germany, pp. 3214–3218 (2015)

    Google Scholar 

  13. Sadok, S., Leglaive, S., Séguier, R.: A vector quantized masked autoencoder for speech emotion recognition. In: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pp. 1–5 (2023)

    Google Scholar 

  14. Scherer, K.R.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40(1–2), 227–256 (2003)

    Article  MATH  Google Scholar 

  15. Swain, M., Routray, A., Kabisatpathy, P.: Databases, features and classifiers for speech emotion recognition: a review. Int. J. Speech Technol. 21(1), 93–120 (2018). https://doi.org/10.1007/s10772-018-9491-z

    Article  Google Scholar 

  16. Tak, H., Patil, H.A.: Novel linear frequency residual cepstral features for replay attack detection. In: Interspeech, Hyderabad, India, pp. 726–730 (2018)

    Google Scholar 

  17. Tripathi, S., Kumar, A., Ramesh, A., Singh, C., Yenigalla, P.: Focal loss based residual convolutional neural network for speech emotion recognition. arXiv preprint arXiv:1906.05682 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baveet Singh Hora .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hora, B.S., Uthiraa, S., Patil, H.A. (2023). Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics