An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Ai, Xusheng; Sheng, Victor S.; Fang, Wei; Ling, Charles X.

doi:10.1007/s11042-020-09155-3

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Published: 19 June 2020

Volume 79, pages 24281–24301, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xusheng Ai ORCID: orcid.org/0000-0001-5629-9134¹,
Victor S. Sheng²,
Wei Fang³ &
…
Charles X. Ling⁴

274 Accesses
1 Citation
Explore all metrics

Abstract

In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F₁ score. It is divided into three aspects: 1) A variant of F₁ score (TF₁ score) takes recall above a lower bound and F₁ score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF₁ score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F₁ score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Article 19 November 2021

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

References

Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42(4):335
Article Google Scholar
Chen M, He X, Yang J, Zhang H (2018) 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters 25(10):1440–1444
Article Google Scholar
Chen X, Ragni A, Liu X, Gales M (2017) Investigating bidirectional recurrent neural network language models for speech recognition. pp 269–273
Chollet F, et al. (2018) Keras: the python deep learning library. Astrophysics Source Code Library
Dong Y, Zhang Z, Hong W (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11(4):1009
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2009) OpenEAR—introducing the munich open-source emotion and affect recognition toolkit. In: 2009 3rd international conference on affective computing and intelligent interaction and workshops. IEEE, pp 1–6
Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: 2017 international conference on computer vision, pp 4558–4567
Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth annual conference of the international speech communication association
Hong W, Li M, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443
Article MathSciNet Google Scholar
Hoon S, Keith W, Farrar CR (2001) Novelty detection using auto associative neural network. In: ASEM international mechanical engineering congress and exposition, pp 573–580
Huang C, Narayanan SS (2016) Attention assisted discovery of sub-utterance structure in speech emotion recognition. In: INTERSPEECH, pp 1387–1391
Huang C, Narayanan SS (2017) Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 583–588
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv: Learning
Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69
Google Scholar
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth annual conference of the international speech communication association
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, vol 30, p 3
Masko D, Hensman P (2015) The impact of imbalanced training data for convolutional neural networks
Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2227–2231
Trigeorgis G, Ringeval F, Brueckner R, Marchi E, Nicolaou MA, Schuller B, Zafeiriou S (2016) Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5200–5204
Vuckovic F, Lauc G, Aulchenko Y (2015) Normalization and batch correction methods for high-throughput glycomics. In: XXIII international symposium on glycoconjugates (GLYCO 23) ,
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Article Google Scholar
Yang N, Muraleedharan R, Kohl J, Demirkol I, Heinzelman W, Sturge-Apple M (2012) Speech-based emotion classification using multiclass svm with hybrid kernel and thresholding fusion. In: 2012 IEEE spoken language technology workshop (slt), IEEE, pp 455–460
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dynamics 98(4)
Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658
Article Google Scholar
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47:312–323
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by the National Natural Science Foundation of China under grant No. 61472267, No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under grant No. SYG201903.

Funding

This study was funded by Natural Science Foundation of China (grant number: 61472267, 61702351), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number: 17KJB520036), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (grant number: SZS201609), Suzhou Science and Technology Plan Project (grant number: SYG201903).

Author information

Authors and Affiliations

Software and Service Outsourcing College, Suzhou Vocational Institute of Industrial Technology, Suzhou, 215104, People’s Republic of China
Xusheng Ai
Department of Computer Science, Texas Tech University, Lubbock, TX, 79409, USA
Victor S. Sheng
School of Computer and Software, Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, China
Wei Fang
Department of Computer Science, Western University, London, ON, N6A 5B7, Canada
Charles X. Ling

Authors

Xusheng Ai
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Fang
View author publications
You can also search for this author in PubMed Google Scholar
Charles X. Ling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xusheng Ai or Victor S. Sheng.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ai, X., Sheng, V.S., Fang, W. et al. An optimal model with a lower bound of recall for imbalanced speech emotion recognition. Multimed Tools Appl 79, 24281–24301 (2020). https://doi.org/10.1007/s11042-020-09155-3

Download citation

Received: 28 August 2019
Revised: 29 May 2020
Accepted: 04 June 2020
Published: 19 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-020-09155-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Informed Consent

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An optimal model with a lower bound of recall for imbalanced speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review

Facial emotion recognition using convolutional neural networks (FERC)

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Informed Consent

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation