Skip to main content

Advertisement

Log in

A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The intelligent human-computer interface should not only provide automatic emotion inference, but it should also provide information about emotion change. The former has generated promising results, whilst the latter is currently being researched. Emotional transformation, or a quick change in one’s emotional state, is a normal part of life that can be triggered by mental stress, ongoing situations, or people we connect with. To better control these emotions, it’s vital to recognize triggers and early warning signs of imminent emotional swings. The Single frequency filtering (SFF) spectrogram is a visual representation of voice that captures both temporal and frequency resolutions at the same time. In this study, convolutional neural network (CNN) EfficientNetB0 is used to study patterns in SFF spectrogram for localizing the instants of emotion change. The process used in detecting the instant of emotion change detection could be broken into two stages. First stage deals with the construction of SFF spectrograms from speech samples belonging to each pitch cycle. Second stage deals with predicting the time when emotional changes occur. The performance of the proposed method is evaluated using parameters binary accuracy (BAC), binary cross-entropy loss (BECL), binary error (BError), and F1-Score. The proposed method obtains an accuracy of 0.95 and 0.952 on the datasets used. The experimental results obtained using the proposed method on Interactive emotional dyadic motion capture(IEMOCAP) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets establish the supremacy of the proposed method on existing methods and other CNN architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Availability of data materials, and software applications which is developed by present authors.

Code availability

Code Availability did by Shalini Kapoor.

References

  1. Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4):1249

  2. Alisamir S, Ringeval F (2021) On the evolution of speech representations for affective computing: a brief history and critical overview. IEEE Signal Process Mag 38(6):12–21

  3. Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36(4):8197–8203

    Article  Google Scholar 

  4. Aneeja G, Yegnanarayana B (2015) Single frequency filtering approach for discriminating speech and nonspeech. IEEE Trans Audio Speech Lang Process 23(4):705–717. https://doi.org/10.1109/TASLP.2015.2404035

  5. Aneeja G, Yegnanarayana B (2017) Extraction of fundamental frequency from degraded speech using temporal envelopes at high SNR frequencies. IEEE/ACM Trans Audio Speech Language Process 25(4):829–838. https://doi.org/10.1109/TASLP.2017.2666425

    Article  Google Scholar 

  6. Badshah AM, … Baik SW (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78(5):5571–5589. https://doi.org/10.1007/s11042-017-5292-7

    Article  MathSciNet  Google Scholar 

  7. Bakhshi A, Harimi A, Chalup S (2022) CyTex: transforming speech to textured images for speech emotion recognition. Speech Commun 139:62–75

  8. Ben-Ze’ev A (2003) Privacy, emotional closeness, and openness in cyberspace. Comput Hum Behav 19(4):451–467

    Article  Google Scholar 

  9. Busso C, … Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359. https://doi.org/10.1007/s10579-008-9076-6

    Article  Google Scholar 

  10. Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17(4):582–596

    Article  Google Scholar 

  11. Cowie R, Cornelius RR (2003) Describing the emotional states that are expressed in speech. Speech Comm 40(1–2):5–32

    Article  MATH  Google Scholar 

  12. Davidson RJ (1998) Affective style and affective disorders: perspectives from affective neuroscience. Cognit Emot 12:307–330. https://doi.org/10.1080/026999398379628

    Article  Google Scholar 

  13. Fredrickson BL, … Tugade MM (2000) The undoing effect of positive emotions. Motiv Emot 24(4):237–258

    Article  Google Scholar 

  14. Gupta S, Fahad M, Deepak A (2020) Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition. Multimed Tools Appl 79(31):23347–23365.

  15. Huang Z (2015) ‘An investigation of emotion changes from speech’, in 2015 international conference on affective computing and intelligent interaction, ACII 2015, pp. 733–736. https://doi.org/10.1109/ACII.2015.7344650.

  16. Huang Z et al (2014) ‘Speech emotion recognition using CNN’, in MM 2014 - Proceedings of the 2014 ACM Conference on multimedia, pp. 801–804. https://doi.org/10.1145/2647868.2654984

  17. Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894

    Article  Google Scholar 

  18. Jiang W, … Li C (2019) Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors (Switzerland) 19(12):2730.19. https://doi.org/10.3390/s19122730

    Article  Google Scholar 

  19. Kadiri SR, Yegnanarayana B (2017) Epoch extraction from emotional speech using single frequency filtering approach. Speech Comm 86:52–63. https://doi.org/10.1016/j.specom.2016.11.005

    Article  Google Scholar 

  20. Kadiri SR, Yegnanarayana B (2019) Analysis of aperiodicity in artistic Noh singing voice using an impulse sequence representation of excitation source. J Acoustical Soc America 146(6):4446–4457. https://doi.org/10.1121/1.5139225

    Article  Google Scholar 

  21. Kim Y, Provost EM (2016) ‘Emotion spotting: discovering regions of evidence in audio-visual emotion expressions’, in ICMI 2016 - Proceedings of the 18th ACM international conference on multimodal interaction, pp. 92–99. https://doi.org/10.1145/2993148.2993151

  22. Kwon S et al (2020) A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183

    Google Scholar 

  23. Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American english. PLoS One 13(5):e0196391. https://doi.org/10.1371/journal.pone.0196391

    Article  Google Scholar 

  24. Mao Q, … Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimedia 16(8):2203–2213. https://doi.org/10.1109/TMM.2014.2360798

  25. Meng H, … Wei H (2019) Speech emotion recognition from 3D log-Mel spectrograms with deep learning network. IEEE Access 7:125868–125881. https://doi.org/10.1109/ACCESS.2019.2938007

    Article  Google Scholar 

  26. Nam Y, Lee C (2021) Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors 21(13):4399

  27. Sezgin MC, Gunsel B, Kurt GK (2012) Perceptual audio features for emotion detection. EURASIP J Audio, Speech, Music Process 2012(1):1–21

    Article  Google Scholar 

  28. Suveg C, … Kendall PC (2009) Changes in emotion regulation following cognitive-behavioral therapy for anxious youth. J Clin Child Adolesc Psychol 38(3):390–401

    Article  Google Scholar 

  29. Tan M, Le QV (2019) ‘EfficientNet: rethinking model scaling for convolutional neural networks’, in 36th international conference on machine learning. ICML 2019:10691–10700

    Google Scholar 

  30. Thanaraj KP, Noel JRA, Vijayarajan R (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst 7(4):1919–1934

    Article  Google Scholar 

  31. Wani TM, … Ambikairajah E (2021) A comprehensive review of speech emotion recognition systems. IEEE Access 9:47795–47814

    Article  Google Scholar 

  32. Zhang J, … Cui D (2018) Analysis on speech signal features of manic patients. J Psychiatr Res 98:59–63

    Article  Google Scholar 

  33. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D \& 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323

    Article  Google Scholar 

  34. Zheng WQ, Yu JS, Zou YX (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 international conference on affective computing and intelligent interaction (ACII), pp 827–831

  35. Zhu J, Thagard P (2002) Emotion and action. Philos Psychol 15(1):19–36

Download references

Author information

Authors and Affiliations

Authors

Contributions

The methodology was performed by Shalini Kapoor. -Literature survey is jointly done by all authors.

Corresponding author

Correspondence to Shalini Kapoor.

Ethics declarations

Ethical approval

All authors ensure ethical approval.

Consent to participate

All authors have the consent to participate.

Consent to publish

All authors has the consent to publish the paper in this journal.

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kapoor, S., Kumar, T. A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle. Multimed Tools Appl 82, 9413–9429 (2023). https://doi.org/10.1007/s11042-022-13731-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13731-0

Keywords

Navigation