A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle

Kapoor, Shalini; Kumar, Tarun

doi:10.1007/s11042-022-13731-0

A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle

Published: 09 September 2022

Volume 82, pages 9413–9429, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

284 Accesses
1 Altmetric
Explore all metrics

Abstract

The intelligent human-computer interface should not only provide automatic emotion inference, but it should also provide information about emotion change. The former has generated promising results, whilst the latter is currently being researched. Emotional transformation, or a quick change in one’s emotional state, is a normal part of life that can be triggered by mental stress, ongoing situations, or people we connect with. To better control these emotions, it’s vital to recognize triggers and early warning signs of imminent emotional swings. The Single frequency filtering (SFF) spectrogram is a visual representation of voice that captures both temporal and frequency resolutions at the same time. In this study, convolutional neural network (CNN) EfficientNetB0 is used to study patterns in SFF spectrogram for localizing the instants of emotion change. The process used in detecting the instant of emotion change detection could be broken into two stages. First stage deals with the construction of SFF spectrograms from speech samples belonging to each pitch cycle. Second stage deals with predicting the time when emotional changes occur. The performance of the proposed method is evaluated using parameters binary accuracy (BAC), binary cross-entropy loss (BECL), binary error (BError), and F1-Score. The proposed method obtains an accuracy of 0.95 and 0.952 on the datasets used. The experimental results obtained using the proposed method on Interactive emotional dyadic motion capture(IEMOCAP) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets establish the supremacy of the proposed method on existing methods and other CNN architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition

Article 08 June 2020

From Data to Emotions: Affective Computing in Voice Emotion Detection

Multi-featured Speech Emotion Recognition Using Extended Convolutional Neural Network

Data availability

Availability of data materials, and software applications which is developed by present authors.

Code availability

Code Availability did by Shalini Kapoor.

References

Abbaschian BJ, Sierra-Sosa D, Elmaghraby A (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4):1249
Alisamir S, Ringeval F (2021) On the evolution of speech representations for affective computing: a brief history and critical overview. IEEE Signal Process Mag 38(6):12–21
Altun H, Polat G (2009) Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Syst Appl 36(4):8197–8203
Article Google Scholar
Aneeja G, Yegnanarayana B (2015) Single frequency filtering approach for discriminating speech and nonspeech. IEEE Trans Audio Speech Lang Process 23(4):705–717. https://doi.org/10.1109/TASLP.2015.2404035
Aneeja G, Yegnanarayana B (2017) Extraction of fundamental frequency from degraded speech using temporal envelopes at high SNR frequencies. IEEE/ACM Trans Audio Speech Language Process 25(4):829–838. https://doi.org/10.1109/TASLP.2017.2666425
Article Google Scholar
Badshah AM, … Baik SW (2019) Deep features-based speech emotion recognition for smart affective services. Multimed Tools Appl 78(5):5571–5589. https://doi.org/10.1007/s11042-017-5292-7
Article MathSciNet Google Scholar
Bakhshi A, Harimi A, Chalup S (2022) CyTex: transforming speech to textured images for speech emotion recognition. Speech Commun 139:62–75
Ben-Ze’ev A (2003) Privacy, emotional closeness, and openness in cyberspace. Comput Hum Behav 19(4):451–467
Article Google Scholar
Busso C, … Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359. https://doi.org/10.1007/s10579-008-9076-6
Article Google Scholar
Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17(4):582–596
Article Google Scholar
Cowie R, Cornelius RR (2003) Describing the emotional states that are expressed in speech. Speech Comm 40(1–2):5–32
Article MATH Google Scholar
Davidson RJ (1998) Affective style and affective disorders: perspectives from affective neuroscience. Cognit Emot 12:307–330. https://doi.org/10.1080/026999398379628
Article Google Scholar
Fredrickson BL, … Tugade MM (2000) The undoing effect of positive emotions. Motiv Emot 24(4):237–258
Article Google Scholar
Gupta S, Fahad M, Deepak A (2020) Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition. Multimed Tools Appl 79(31):23347–23365.
Huang Z (2015) ‘An investigation of emotion changes from speech’, in 2015 international conference on affective computing and intelligent interaction, ACII 2015, pp. 733–736. https://doi.org/10.1109/ACII.2015.7344650.
Huang Z et al (2014) ‘Speech emotion recognition using CNN’, in MM 2014 - Proceedings of the 2014 ACM Conference on multimedia, pp. 801–804. https://doi.org/10.1145/2647868.2654984
Issa D, Demirci MF, Yazici A (2020) Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control 59:101894
Article Google Scholar
Jiang W, … Li C (2019) Speech emotion recognition with heterogeneous feature unification of deep neural network. Sensors (Switzerland) 19(12):2730.19. https://doi.org/10.3390/s19122730
Article Google Scholar
Kadiri SR, Yegnanarayana B (2017) Epoch extraction from emotional speech using single frequency filtering approach. Speech Comm 86:52–63. https://doi.org/10.1016/j.specom.2016.11.005
Article Google Scholar
Kadiri SR, Yegnanarayana B (2019) Analysis of aperiodicity in artistic Noh singing voice using an impulse sequence representation of excitation source. J Acoustical Soc America 146(6):4446–4457. https://doi.org/10.1121/1.5139225
Article Google Scholar
Kim Y, Provost EM (2016) ‘Emotion spotting: discovering regions of evidence in audio-visual emotion expressions’, in ICMI 2016 - Proceedings of the 18th ACM international conference on multimodal interaction, pp. 92–99. https://doi.org/10.1145/2993148.2993151
Kwon S et al (2020) A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1):183
Google Scholar
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American english. PLoS One 13(5):e0196391. https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Mao Q, … Zhan Y (2014) Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans Multimedia 16(8):2203–2213. https://doi.org/10.1109/TMM.2014.2360798
Meng H, … Wei H (2019) Speech emotion recognition from 3D log-Mel spectrograms with deep learning network. IEEE Access 7:125868–125881. https://doi.org/10.1109/ACCESS.2019.2938007
Article Google Scholar
Nam Y, Lee C (2021) Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors 21(13):4399
Sezgin MC, Gunsel B, Kurt GK (2012) Perceptual audio features for emotion detection. EURASIP J Audio, Speech, Music Process 2012(1):1–21
Article Google Scholar
Suveg C, … Kendall PC (2009) Changes in emotion regulation following cognitive-behavioral therapy for anxious youth. J Clin Child Adolesc Psychol 38(3):390–401
Article Google Scholar
Tan M, Le QV (2019) ‘EfficientNet: rethinking model scaling for convolutional neural networks’, in 36th international conference on machine learning. ICML 2019:10691–10700
Google Scholar
Thanaraj KP, Noel JRA, Vijayarajan R (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst 7(4):1919–1934
Article Google Scholar
Wani TM, … Ambikairajah E (2021) A comprehensive review of speech emotion recognition systems. IEEE Access 9:47795–47814
Article Google Scholar
Zhang J, … Cui D (2018) Analysis on speech signal features of manic patients. J Psychiatr Res 98:59–63
Article Google Scholar
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D \& 2D CNN LSTM networks. Biomed Signal Process Control 47:312–323
Article Google Scholar
Zheng WQ, Yu JS, Zou YX (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 international conference on affective computing and intelligent interaction (ACII), pp 827–831
Zhu J, Thagard P (2002) Emotion and action. Philos Psychol 15(1):19–36

Download references

Author information

Authors and Affiliations

Dr. A.P.J Abdul Kalam Technical University, Lucknow, India
Shalini Kapoor
Dept. CSE, KIET Group of Institutions, Delhi-NCR, Ghaziabad, India
Shalini Kapoor
Department of Computer Science, Dewan V.S. Group of Institutions, Meerut, India
Tarun Kumar

Authors

Shalini Kapoor
View author publications
You can also search for this author inPubMed Google Scholar
Tarun Kumar
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

The methodology was performed by Shalini Kapoor. -Literature survey is jointly done by all authors.

Corresponding author

Correspondence to Shalini Kapoor.

Ethics declarations

Ethical approval

All authors ensure ethical approval.

Consent to participate

All authors have the consent to participate.

Consent to publish

All authors has the consent to publish the paper in this journal.

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kapoor, S., Kumar, T. A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle. Multimed Tools Appl 82, 9413–9429 (2023). https://doi.org/10.1007/s11042-022-13731-0

Download citation

Received: 19 June 2021
Revised: 12 January 2022
Accepted: 25 August 2022
Published: 09 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13731-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach to detect instant emotion change through spectral variation in single frequency filtering spectrogram of each pitch cycle

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition

From Data to Emotions: Affective Computing in Voice Emotion Detection

Multi-featured Speech Emotion Recognition Using Extended Convolutional Neural Network

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now