Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Alex, Starlet Ben; Mary, Leena; Babu, Ben P.

doi:10.1007/s00034-020-01429-3

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Published: 14 May 2020

Volume 39, pages 5681–5709, (2020)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Starlet Ben Alex¹,
Leena Mary² &
Ben P. Babu¹

890 Accesses
22 Citations
Explore all metrics

Abstract

This work attempts to recognize emotions from human speech using prosodic information represented by variations in duration, energy, and fundamental frequency (\(F_{0}\)) values. For this, the speech signal is first automatically segmented into syllables. Prosodic features at the utterance (15 features) and syllable level (10 features) are extracted using the syllable boundaries and trained separately using deep neural network classifiers. The effectiveness of the proposed approach is demonstrated on German speech corpus-EMOTional Sensitivity ASistance System (EmotAsS) for people with disabilities, the dataset used for the Interspeech 2018 Atypical Affect Sub-Challenge. The initial set of prosodic features on evaluation yields an unweighted average recall (UAR) of 30.15%. A fusion of the decision scores of these features with spectral features gives a UAR of 36.71%. This paper also employs methods like attention mechanism and feature selection using resampling-based recursive feature elimination (RFE) to enhance system performance. Implementing attention and feature selection followed by a score-level fusion improves the UAR to 36.83% and 40.96% for prosodic features and overall fusion, respectively. The fusion of the scores of the best individual system of the Atypical Affect Sub-Challenge and the proposed system provides a UAR (43.71%) above the best test result reported. The effectiveness of the proposed system has also been demonstrated on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database with a UAR of 63.83%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 5

A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation

Article 29 December 2023

Mohammed Tellai, Lijian Gao, … Mounir Abdelaziz

Syllable Level Speech Emotion Recognition Based on Formant Attention

Summary and Conclusions

References

S.B. Alex, B.P. Babu, L. Mary, Utterance and syllable level prosodic features for automatic emotion recognition, in 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS) (IEEE, 2018) pp 31–35
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473
R. Banse, K.R. Scherer, Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Google Scholar
I. Bisio, A. Delfino, F. Lavagetto, M. Marchese, A. Sciarrone, Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans. Emerg. Top. Comput. 1(2), 244–257 (2013)
Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W.F. Sendlmeier, B. Weiss, A database of German emotional speech, in Ninth European Conference on Speech Communication and Technology (2005)
C. Busso, M. Bulut, C.C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J.N. Chang, S. Lee, S.S. Narayanan, Iemocap: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)
C. Busso, S. Lee, S. Narayanan, Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans. Audio Speech Lang. Process. 17(4), 582–596 (2009)
Google Scholar
M. Cabanac, What is emotion? Behav. Process. 60(2), 69–83 (2002)
Google Scholar
D.A. Cairns, J.H. Hansen, Nonlinear analysis and classification of speech under stressed conditions. J. Acoust. Soc. Am. 96(6), 3392–3400 (1994)
Google Scholar
N. Campbell, P. Mokhtari, Voice quality: the 4th prosodic dimension, in 15th ICPhS, (2003), pp. 2417–2420
L. Chen, X. Mao, Y. Xue, L.L. Cheng, Speech emotion recognition: Features and classification models. Digit. Signal Proc. 22(6), 1154–1160 (2012)
MathSciNet Google Scholar
V. Chernykh, G. Sterling, P. Prihodko, Emotion recognition from speech with recurrent neural networks (2017). arXiv:1701.08071
F. Chollet, et al. Keras. https://github.com/keras-team/keras (2015)
J.K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, Attention-based models for speech recognition, in Advances in Neural Information Processing Systems, (2015), pp. 577–585
R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, J.G. Taylor, Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Google Scholar
G.E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Google Scholar
C. Darwin, P. Prodger, The Expression of the Emotions in Man and Animals (Oxford University Press, New York, 1998)
Google Scholar
S.B. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Google Scholar
K. Djolander, The snack sound toolkit. http://www.speech.kth.se/snack (2004)
M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
MATH Google Scholar
F. Eyben, A. Batliner, B. Schuller, Towards a standard set of acoustic features for the processing of emotion in speech, in Proceedings of Meetings on Acoustics 159ASA, vol. 9 (ASA, 2010), p 060006
F. Eyben, K.R. Scherer, B.W. Schuller, J. Sundberg, E. André, C. Busso, L.Y. Devillers, J. Epps, P. Laukka, S.S. Narayanan et al., The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2015)
Google Scholar
M.P. Gelfer, D.M. Fendel, Comparisons of jitter, shimmer, and signal-to-noise ratio from directly digitized versus taped voice samples. J. Voice 9(4), 378–382 (1995)
Google Scholar
S. Gharsellaoui, S.A. Selouani, A.O. Dahmane, Automatic emotion recognition using auditory and prosodic indicative features, in IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), 2015 (IEEE, 2015) pp. 1265–1270
I.J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, Y. Bengio, Maxout networks (2013). arXiv:1302.4389
G. Gosztolya, T. Grósz, L. Tóth, General utterance-level feature extraction for classifying crying sounds, atypical & self-assessed affect and heart beats. Proc. Interspeech 2018, 531–535 (2018)
Google Scholar
A. Graves, Generating sequences with recurrent neural networks (2013). arXiv:1308.0850
K. Han, D. Yu, I. Tashev, Speech emotion recognition using deep neural network and extreme learning machine, in Fifteenth Annual Conference of the International Speech Communication Association (2014)
S. Hantke, H. Sagha, N. Cummins, B. Schuller, Emotional speech of mentally and physically disabled individuals: Introducing the emotass database and first findings, in Proceedings of Interspeech 2017 (ISCA, Stockholm, Sweden, 2017) pp. 3137–3141
Q. Jin, C. Li, S. Chen, H. Wu, Speech emotion recognition with acoustic and lexical features, in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (IEEE, 2015) pp. 4749–4753
W.F. Johnson, R.N. Emde, K.R. Scherer, M.D. Klinnert, Recognition of emotion from vocal cues. Arch. Gen. Psychiatry 43(3), 280–283 (1986)
Google Scholar
P.N. Juslin, P. Laukka, Communication of emotions in vocal expression and music performance: Different channels, same code? Psychol. Bull. 129(5), 770–814 (2003)
Google Scholar
S.G. Koolagudi, K.S. Rao, Emotion recognition from speech: a review. Int. J. Speech Technol. 15(2), 99–117 (2012)
Google Scholar
M. Kuhn, Building predictive models in r using the caret package. J. Stat. Softw. 8(5), 1–26 (2008)
Google Scholar
O.W. Kwon, K. Chan, J. Hao, T.W. Lee, Emotion recognition by speech signals, in INTERSPEECH (2003)
S. Latif, R. Rana, J. Qadir, J. Epps, Variational autoencoders for learning latent representations of speech emotion: A preliminary study. Proc. Interspeech 2018, 3107–3111 (2018)
Google Scholar
P. Laukka, P.N. Juslin, A. Gabrielsson, Impact of intended emotion intensity on cue utilization and decoding accuracy in vocal expression of emotion. Int. J. Psychol. 35, 288–289 (2000)
Google Scholar
C.M. Lee, S.S. Narayanan, Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)
Google Scholar
C.M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, S. Narayanan, Emotion recognition based on phoneme classes, in Interspeech, (2004) pp. 205–211
J. Lee, I. Tashev, High-level feature representation using recurrent neural network for speech emotion recognition, in Sixteenth Annual Conference of the International Speech Communication Association (2015)
P. Li, Y. Song, I. McLoughlin, W. Guo, L. Dai, An attention pooling based representation learning method for speech emotion recognition. Proc. Interspeech 2018, 3087–3091 (2018)
X. Li, J. Tao, M.T. Johnson, J. Soltis, A. Savage, K.M. Leong, J.D. Newman, Stress and emotion classification using jitter and shimmer features, in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) vol. 4 (IEEE, 2007), pp IV–1081
A. Liaw, M. Wiener, Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
W. Lim, D. Jang, T. Lee, Speech emotion recognition using convolutional and recurrent neural networks, in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific (IEEE, 2016) pp. 1–4
I. Luengo, E. Navas, I. Hernáez, Feature analysis and evaluation for automatic emotion identification in speech. IEEE Trans. Multimedia 12(6), 490–501 (2010)
Google Scholar
M. Lugger, B. Yang, The relevance of voice quality features in speaker independent emotion recognition, in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) vol 4, (IEEE, 2007), pp. IV–17
D. Luo, Y. Zou, D. Huang, Investigation on joint representation learning for robust feature extraction in speech emotion recognition. Proc. Interspeech 2018, 152–156 (2018)
Google Scholar
L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50(10), 782–796 (2008)
Google Scholar
L. Mary, A.P. Antony, B.P. Babu, S.M. Prasanna, Automatic syllabification of speech signal using short time energy and vowel onset points. Int. J. Speech Technol. 21, 1–9 (2018)
Google Scholar
A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, S. Narayanan, Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affect. Comput. 3(2), 184–198 (2012)
Google Scholar
S. Mirsamadi, E. Barsoum, C. Zhang, Automatic speech emotion recognition using recurrent neural networks with local attention, in 2017 IEEE International Conference on Acoustics (Speech and Signal Processing (ICASSP), IEEE, 2017), pp. 2227–2231
V. Mohanan, L. Mary, Prosody based emotion recognition using SVM, in International Conference on Signal and Speech Processing (ICSSP), (2016) pp. 100–105
I.R. Murray, J.L. Arnott, Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Google Scholar
M. Neumann, N.T. Vu, Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech (2017). arXiv:1706.00612
T.L. Nwe, S.W. Foo, L.C. De Silva, Speech emotion recognition using hidden markov models. Speech Commun. 41(4), 603–623 (2003)
Google Scholar
J. Qiu, K. Sun, I.J. Rudas, H. Gao, Command filter-based adaptive nn control for mimo nonlinear systems with full-state constraints and actuator hysteresis. IEEE Trans. Cybern. (2019a)
J. Qiu, K. Sun, T. Wang, H. Gao, Observer-based fuzzy adaptive event-triggered control for pure-feedback nonlinear systems with prescribed performance. IEEE Trans. Fuzzy Syst. 27(11), 2152–2162 (2019b)
Google Scholar
R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018). https://www.R-project.org/
K.S. Rao, S.G. Koolagudi, R.R. Vempada, Emotion recognition from speech using global and local prosodic features. Int. J. Speech Technol. 16(2), 143–160 (2013)
Google Scholar
F. Richardson, D. Reynolds, N. Dehak, Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Google Scholar
J. Rong, G. Li, Y.P.P. Chen, Acoustic feature selection for automatic emotion recognition from speech. Inf. Process. Manag. 45(3), 315–328 (2009)
Google Scholar
A. Satt, S. Rozenberg, R. Hoory, Efficient emotion recognition from speech using deep learning on spectrograms, in INTERSPEECH, (2017) pp. 1089–1093
K.R. Scherer, Methods of research on vocal communication: Paradigms and parameters. Handbook of methods in nonverbal behavior research, pp. 136–198 (1982)
K.R. Scherer et al., Psychological models of emotion. Neuropsychol. Emot. 137(3), 137–162 (2000)
Google Scholar
B. Schuller, G. Rigoll, Timing levels in segment-based speech emotion recognition, in Proceedings of International Conference on Spoken Language Processing ICSLP, Pittsburgh, USA, 2006
B. Schuller, G. Rigoll, M. Lang, Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture, in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1, (IEEE, 2004), pp. I–577
B. Schuller, S. Steidl, A. Batliner, The interspeech 2009 emotion challenge, in Proceedings of Interspeech 2009, Brighton, UK, (2009) pp. 312–315
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, S. Narayanan, The interspeech 2010 paralinguistic challenge, in Proceedings of INTERSPEECH 2010, Makuhari, Japan, (2010a) pp. 2794–2797
B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, G. Rigoll, Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010b)
Google Scholar
B. Schuller, A. Batliner, S. Steidl, F. Schiel, J. Krajewski, The interspeech 2011 speaker state challenge, in Proceedings of INTERSPEECH 2011, Florence, Italy (2011a)
B. Schuller, A. Batliner, S. Steidl, D. Seppi, Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011b)
Google Scholar
B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, The interspeech 2012 speaker trait challenge, in Proceedings of INTERSPEECH, Portland (OR, USA, 2012)
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, et al. The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, in Proceedings of INTERSPEECH 2013, Lyon, France
B.W. Schuller, S. Steidl, A. Batliner, P.B. Marschik, H. Baumeister, F. Dong, S. Hantke, F. Pokorny, E.M. Rathner, K.D. Bartl-Pokorny, et al. The interspeech 2018 computational paralinguistics challenge: Atypical & self-assessed affect, crying & heart beats, in Proceedings of Interspeech 2018, Hyderabad, India pp. 122–126 (2018)
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, B. Schuller, Deep neural networks for acoustic emotion recognition: raising the benchmarks, 2011 IEEE international conference on Acoustics (Speech and Signal Processing (ICASSP), IEEE, 2011), pp. 5688–5691
Google Scholar
K. Sun, S. Mou, J. Qiu, T. Wang, H. Gao, Adaptive fuzzy control for nontriangular structural stochastic switched nonlinear systems with full state constraints. IEEE Trans. Fuzzy Syst. 27(8), 1587–1601 (2018)
Google Scholar
R. Sun, E. Moore, J.F. Torres, Investigating glottal parameters for differentiating emotional categories with similar prosodics, 2009 IEEE International Conference on Acoustics (Speech and Signal Processing (ICASSP), IEEE, 2009), pp. 4509–4512
Google Scholar
J. Sundberg, S. Patel, E. Bjorkner, K.R. Scherer, Interdependencies among voice source parameters in emotional speech. IEEE Trans. Affect. Comput. 2(3), 162–174 (2011)
Google Scholar
D. Tacconi, O. Mayora, P. Lukowicz, B. Arnrich, C. Setz, G. Troster, C. Haring, Activity and emotion recognition to support early diagnosis of psychiatric diseases, in Second International Conference on Pervasive Computing Technologies for Healthcare, pp. 100–102 (2008)
D. Tang, J. Zeng, M. Li, An end-to-end deep learning framework with speech emotion recognition of atypical individuals. Proc. Interspeech 2018, 162–166 (2018)
Google Scholar
P. Taylor, Analysis and synthesis of intonation using the tilt model. J. Acoust. Soc. Am. 107(3), 1697–1714 (2000)
Google Scholar
L. Ten Bosch, Emotions, speech and the asr framework. Speech Commun. 40(1), 213–225 (2003)
MATH Google Scholar
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M.A. Nicolaou, B. Schuller, S. Zafeiriou, Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2016) pp. 5200–5204
E. Variani, X. Lei, E. McDermott, I. Lopez-Moreno, J. Gonzalez-Dominguez, Deep neural networks for small footprint text-dependent speaker verification, 2014 IEEE International Conference on Acoustics (Speech and Signal Processing (ICASSP), IEEE, 2014), pp. 4052–4056
Google Scholar
V. Vegesna, P. Jain, K. Gurugubelli, A. Vuppala, Emotional speech classifier systems: For sensitive assistance to support disabled individuals, pp. 6–10 (2018). https://doi.org/10.21437/SMM.2018-2
J. Wagner, D. Schiller, A. Seiderer, E. André, Deep learning in paralinguistic recognition tasks: Are hand-crafted features still relevant? Proc. Interspeech 2018, 147–151 (2018)
Google Scholar
Z.Q. Wang, I. Tashev, Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks, 2017 IEEE International Conference on Acoustics (Speech and Signal Processing (ICASSP), IEEE, 2017), pp. 5150–5154
Google Scholar
S. Yildirim, M. Bulut, C.M. Lee, A. Kazemzadeh, Z. Deng, S. Lee, S. Narayanan, C. Busso, An acoustic study of emotions expressed in speech, in Eighth International Conference on Spoken Language Processing, (2004) pp. 2193–2196
S. Yildirim, S. Narayanan, A. Potamianos, Detecting emotional state of a child in a conversational computer game. Comput. Speech Lang. 25(1), 29–44 (2011)
Google Scholar
D. Yu, M.L. Seltzer, J. Li, J.T. Huang, F. Seide, Feature learning in deep neural networks-studies on speech recognition tasks (2013). arXiv:1301.3605
Z. Zhao, Y. Zheng, Z. Zhang, H. Wang, Y. Zhao, C. Li, Exploring spatio-temporal representations by integrating attention-based bidirectional-lstm-rnns and fcns for speech emotion recognition. Proc. Interspeech 2018, 272–276 (2018)
Google Scholar
P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, B. Xu, Attention-based bidirectional long short-term memory networks for relation classification, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol 2 (2016) pp. 207–212

Download references

Acknowledgements

The authors would like to thank the organizers of the Interspeech 2018 Atypical Affect Sub-Challenge for providing the decision scores of the OpenSMILE ComParE baseline system.

Author information

Authors and Affiliations

Centre for Advanced Signal Processing (CASP), Rajiv Gandhi Institute of Technology, APJ Abdul Kalam Technological University, Kottayam, Kerala, India
Starlet Ben Alex & Ben P. Babu
Department of Electronics and Communication Engineering, Government Engineering College, Idukki, Kerala, India
Leena Mary

Authors

Starlet Ben Alex
View author publications
You can also search for this author in PubMed Google Scholar
Leena Mary
View author publications
You can also search for this author in PubMed Google Scholar
Ben P. Babu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Starlet Ben Alex.

Ethics declarations

Conflict of interest

The authors wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alex, S.B., Mary, L. & Babu, B.P. Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features. Circuits Syst Signal Process 39, 5681–5709 (2020). https://doi.org/10.1007/s00034-020-01429-3

Download citation

Received: 03 September 2019
Revised: 13 April 2020
Accepted: 15 April 2020
Published: 14 May 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s00034-020-01429-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Abstract

Access this article

Similar content being viewed by others

A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation

Syllable Level Speech Emotion Recognition Based on Formant Attention

Summary and Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attention and Feature Selection for Automatic Speech Emotion Recognition Using Utterance and Syllable-Level Prosodic Features

Abstract

Access this article

Similar content being viewed by others

A novel conversational hierarchical attention network for speech emotion recognition in dyadic conversation

Syllable Level Speech Emotion Recognition Based on Formant Attention

Summary and Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation