Abstract
In an early complain warning system, we encounter a common problem - the lack of angry emotions for training classification models. Moreover, the recognition of angry emotion is more important than that of no-anger emotion. Based on this, the main purpose of this paper is to train an optimal model which achieves a high recall above a lower bound and a maximum of F1 score. It is divided into three aspects: 1) A variant of F1 score (TF1 score) takes recall above a lower bound and F1 score into consideration; 2) A Single Emotion Deep Neural Network (SEDNN) and its training process are designed to find an optimal model with a maximum of TF1 score. 3) A performance comparison of different methods is conducted on IEMOCAP and Emo-DB database. Extensive experiments show that when a BCE loss function or a focal loss function is used, the training process can find a model with a recall above a high threshold and a maximum of F1 score. Especially, SEDNN with the focal loss function performs better than SEDNN with the BCE loss function.
Similar content being viewed by others
References
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42(4):335
Chen M, He X, Yang J, Zhang H (2018) 3-d convolutional recurrent neural networks with attention model for speech emotion recognition. IEEE Signal Processing Letters 25(10):1440–1444
Chen X, Ragni A, Liu X, Gales M (2017) Investigating bidirectional recurrent neural network language models for speech recognition. pp 269–273
Chollet F, et al. (2018) Keras: the python deep learning library. Astrophysics Source Code Library
Dong Y, Zhang Z, Hong W (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11(4):1009
Eyben F, Wöllmer M, Schuller B (2009) OpenEAR—introducing the munich open-source emotion and affect recognition toolkit. In: 2009 3rd international conference on affective computing and intelligent interaction and workshops. IEEE, pp 1–6
Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: 2017 international conference on computer vision, pp 4558–4567
Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth annual conference of the international speech communication association
Hong W, Li M, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443
Hoon S, Keith W, Farrar CR (2001) Novelty detection using auto associative neural network. In: ASEM international mechanical engineering congress and exposition, pp 573–580
Huang C, Narayanan SS (2016) Attention assisted discovery of sub-utterance structure in speech emotion recognition. In: INTERSPEECH, pp 1387–1391
Huang C, Narayanan SS (2017) Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 583–588
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv: Learning
Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69
Lee J, Tashev I (2015) High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth annual conference of the international speech communication association
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, vol 30, p 3
Masko D, Hensman P (2015) The impact of imbalanced training data for convolutional neural networks
Mirsamadi S, Barsoum E, Zhang C (2017) Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2227–2231
Trigeorgis G, Ringeval F, Brueckner R, Marchi E, Nicolaou MA, Schuller B, Zafeiriou S (2016) Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5200–5204
Vuckovic F, Lauc G, Aulchenko Y (2015) Normalization and batch correction methods for high-throughput glycomics. In: XXIII international symposium on glycoconjugates (GLYCO 23) ,
Wang K, An N, Li BN, Zhang Y, Li L (2015) Speech emotion recognition using fourier parameters. IEEE Trans Affect Comput 6(1):69–75
Yang N, Muraleedharan R, Kohl J, Demirkol I, Heinzelman W, Sturge-Apple M (2012) Speech-based emotion classification using multiclass svm with hybrid kernel and thresholding fusion. In: 2012 IEEE spoken language technology workshop (slt), IEEE, pp 455–460
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dynamics 98(4)
Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14642–14658
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control 47:312–323
Acknowledgements
This research was partially supported by the National Natural Science Foundation of China under grant No. 61472267, No. 61702351, the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant No. 17KJB520036, Foundation of Key Laboratory in Science and Technology Development Project of Suzhou under grant No. SZS201609, Suzhou Science and Technology Plan Project under grant No. SYG201903.
Funding
This study was funded by Natural Science Foundation of China (grant number: 61472267, 61702351), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number: 17KJB520036), and Foundation of Key Laboratory in Science and Technology Development Project of Suzhou (grant number: SZS201609), Suzhou Science and Technology Plan Project (grant number: SYG201903).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
Informed Consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ai, X., Sheng, V.S., Fang, W. et al. An optimal model with a lower bound of recall for imbalanced speech emotion recognition. Multimed Tools Appl 79, 24281–24301 (2020). https://doi.org/10.1007/s11042-020-09155-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09155-3