Skip to main content

Advertisement

Log in

Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Speech plays a major role in emotional transmitting information in humans, and speech emotion recognition has become an important part of the human–computer system, especially in specific systems with high requirements for real-time and accuracy. To improve the accuracy and real-time of speech emotion recognition, people have done a lot of work in speech emotion feature extraction and speech emotion recognition algorithms, but the recognition rate also needs improvement. In this paper, we propose a speech emotion recognition method based on Mel-frequency Cepstral coefficients (MFCC) and broad learning network. 39-dimensional MFCC features were extracted after preprocess of the speech signal. After labelling and standardizing the data, a data prediction model is built. Finally, the data set is split into training and test data onto a certain ratio (0.8). We experimented with broad learning network architecture. And then the data processing in the broad learning network is improved. The proposed algorithm is a neural network structure that does not rely on deep structure, which has a small amount of calculation, excellent calculation speed and simple structure. The experimental results show that the proposed network architecture achieves higher accuracy and it turned out to be the most accurate in recognizing emotions in CASIA Chinese emotion corpus. The recognition rate can reach 100%. Therefore, the proposed network architecture provides an effective method of speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Park D, Du DHC (2011) Hot data identification for flash-based storage systems using multiple bloom filters. In: Proceedings of MSST2011. IEEE, Piscataway, NJ, pp 1–10

  2. Cowie R, Douglas-Cowie E, Taspatsoulis N (2005) Emotion recognition in human-computer interaction. Neural Netw 18(4):389–405

    Article  Google Scholar 

  3. Byom L, Duff M, Mutlu B, Lyn T (2019) Facial emotion recognition of older adults with traumatic brain injury. Brain Inj 33(3):322–332

    Article  Google Scholar 

  4. Boril H, Sadjadi SO, Kleinschmidt T (2010) Analysis and detection of cognitive load and frustration in drivers’ speech. In: 11th annual conference of the international speech communication association, 2010, pp 502–505

  5. Cong P, Wang C, Ren Z, Wang H, Wang Y, Feng J (2016) Unsatisfied customer call detection with deep learning. In: Proceedings of the 2016 10th international symposiumon chinese spoken language processing (ISCSLP), Tianjin, China, 17–20 October 2016, pp 1–5

  6. Getahun F, Kebede M (2016) Emotion identification from spontaneous communication. In: Proceedings of the 2016 12th international conference on signal-image technology & internet-based systems (SITIS), Naples, Italy, 28 November–1 December 2016, pp 151–158

  7. Li S, Xu L, Yang Z (2017) Multidimensional speaker information recognition based on proposed baseline system. In: Proceedings of the 2017 IEEE 2nd advanced information technology, electronic and automation control conference (IAEAC), Chongqing, China, 25–26 March 2017, pp 1776–1780

  8. Mahdhaoui A, Chetouani M, Zong C (2008) Motherese detection based on segmental and super-segmental features. In: 19th international conference on pattern recognition, 2008, pp 1409–1412

  9. Hrabina M (2017) Analysis of linear predictive coefficients for gunshot detection based on neural networks. In: 2017 IEEE 26th international symposium on industrial electronics, 2017, pp 1961–1965

  10. Hernando J, Nadeu C (1997) Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Trans Speech Audio Process 5(1):80–84

    Article  Google Scholar 

  11. Upadhya SS, Cheeran AN, Nirmal JH (2018) Multitaper perceptual linear prediction features of voice samples to discriminate healthy persons from early stage Parkinson diseased persons. Int J Speech Technol 21(3):391–399

    Article  Google Scholar 

  12. Barpanda SS, Majhi B, Sa PK, Sangaiah AK, Bakshi S (2019) Iris feature extraction through wavelet Mel-frequency cepstrum coefficients. Opt Laser Technol 110:13–23

    Article  Google Scholar 

  13. Kent RD, Vorperian HK (2018) Static measurements of vowel formant frequencies and bandwidths: a review. J Commun Disord 74:74–97

    Article  Google Scholar 

  14. Kawitzky D, McAllister T (2020) The effect of formant biofeedback on the feminization of voice in transgender women. J Voice 34(1):53–67

    Article  Google Scholar 

  15. Gelfer MP, Fendel DM (1995) Comparison of jitter, shimmer, and singal-to-noise ratio from directly digitized versus taped voice samples. J Voice 9(4):378–382

    Article  Google Scholar 

  16. He L, Huang H, Liu XH (2013) Research on emotional speech synthesis algorithm based on rhythmic feature parameters. Comput Eng Des 34(7):2566–2569

    Google Scholar 

  17. Inshirah I, Md SHS (2014) Emotion detection with hybrid voice quality and prosodic features using neural network. In: 2014 4th world congress on information and communication technologies, pp 205–210

  18. Ben Alex S, Babu BP, Mary L (2018) Utterance and syllable level prosodic features for automatic emotion recognition. In: 2018 IEEE recent advances in intelligent computational systems, 2018, pp 31–35

  19. Upadhya SS, Cheeran AN, Nirmal JH (2019) Discriminating Parkinson diseased and healthy people using modified MFCC filter bank approach. Int J Speech Technol 22(4):1021–1029

    Article  Google Scholar 

  20. Likitha MS, Gupta RR, Hasitha K, Raju A (2017) Speech based human emotion recognition using MFCC. In: 2017 2nd IEEE international conference on wireless communications, signal processing and networking, pp 2257–2260

  21. Zhao Y, Zhao L, Zou ZR (2009) Application of improved quadratic discriminant combining rhythm and sound quality parameters in speech emotion recognition. Sig Process 25(6):882–887

    Google Scholar 

  22. Jacob A (2016) Speech emotion recognition based on minimal voice quality features. In: 2016 international conference on communication and signal processing, 2016, pp 886–890

  23. Peng J, Wang N, El-Latif AAA., Li Q, Niu X (2012) Finger-vein verification using Gabor filter and SIFT feature matching. In: 2012 eighth international conference on intelligent information hiding and multimedia signal processing, Piraeus, 2012, pp 45–48

  24. Gad R, El-Latif AAA, Elseuofi S, Ibrahim HM, Elmezain M (2019) Said Wael IoT security based on iris verification using multi-algorithm feature level fusion scheme. In: 2019 2nd international conference on computer applications & information security (ICCAIS), pp 1–6

  25. Mellinger M (1987) Chemometrics and Intelligent Laboratory Systems 2(1):37–52

  26. Shraddha B, Rasika I, Aarti B (2014) Emotion based speaker recognition with vector quantization. In: International conference on electronics & computing technologies, ICONECT-2014

  27. Rajisha TM, Sunija AP, Riyas KS (2016) Performance analysis of Malayalam language speech emotion recognition system using ANN/SVM. Procedia Technol 24:1097–1104

    Article  Google Scholar 

  28. Pribil J, Pribilova A, Matousek J (20199) Artefact determination by GMM-based continuous detection of emotional changes in synthetic speech. In: 2019 42nd international conference on telecommunications and signal processing, 2019, pp 45–48

  29. Qin YQ, Zhang XY (2011) HMM-based speaker emotional recognition technology for speech signal. Adv Mater Res 230(231/232):261–265

    Article  Google Scholar 

  30. Meftah IT, Thanh NL, Ben Amar C (2012) Emotion recognition using KNN classification for user modeling and sharing of affect states. In: Lecture Notes in Computer Science, 2012, vol 7663, pp 234–242

  31. Zhang WS, Zhao DH, Chai Z, Yang Laurence T, Liu X, Gong F, Yang S (2017) Deep learning and SVM-based emotion recognition from Chinese speech for smart affective services. Softw Pract Exp 47(8):1127–1138

    Google Scholar 

  32. Jacob A (2017) Modelling speech emotion recognition using logistic regression and decision trees. Int J Speech Technol 20(4):897–905

    Article  Google Scholar 

  33. Zeng R, Zhang S (2018) Improvement of convolutional neural networks for speech emotion recognition. J Appl Sci 36(5):837–844

    Google Scholar 

  34. Lorenzo-Trueba J, Henter GE, Takaki S, Yamagishi J, Morino Y, Ochiai Y (2018) Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis. Speech Commun 99:135–143

    Article  Google Scholar 

  35. Zhao JF, Mao X, Chen L (2018) Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Proc 12(6):713–721

    Article  Google Scholar 

  36. Wang H, Liu E, Chao Y, Liu Y, Ni L (2020) Speech emotion by fusion of GFCC and rhythmic feature parameters Identification. J China Crim Police Acad 02:124–128

    Google Scholar 

  37. Liu ZT, Wu M, Cao WH, Mao JW, Xu JP, Tan GZ (2018) Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273:271–280

    Article  Google Scholar 

  38. Zhu L, Chen L, Zhao D, Zhou J, Zhang W (2017) Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors 17:1694

    Article  Google Scholar 

  39. Liu ZT, Xie Q, Wu M, Cao WH, Mei Y, Mao JW (2018) Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing 309:145–156

    Article  Google Scholar 

  40. Ahmed MAO, Reyad O, El-Rahiem BA (2019) An efficient deep convolutional neural network for visual image classification. In: The international conference on advanced machine learning technologies and applications (AMLTA2019), 2019

  41. Alghamdi AS, Polat K, Alghoson A, Alshdadi AA, EI-Latif AAA (2020) A novel blood pressure estimation method based on the classification of oscillometric waveforms using machine-learning methods. Appl Acoust 164:107279

    Article  Google Scholar 

  42. Bai X, Zhang T, Wang C, Ahmed A, Niu X (2013) A fully automatic player detection method based on one-class SVM. Ice Trans Inf Syst 96(2):387–391

    Article  Google Scholar 

  43. Alghamdi AS, Polat K, Alghoson A, Alshdadi AA, El-Latif AAA (2020) Gaussian process regression (GPR) based non-invasive continuous blood pressure prediction method from cuff oscillometric signals. Appl Acoust 164:107256

    Article  Google Scholar 

  44. Jadad HA, Touzene A, Day K (2020) Offloading as a service middleware for mobile cloud apps. In: International journal of cloud applications and computing (IJCAC), 2020, vol 10

  45. Alsmirat MA, Al-Alem F, Al-Ayyoub M, Jararweh Y, Gupta B (2019) Impact of digital fingerprint image quality on the fingerprint recognition accuracy. Multimedia Tools Appl 78(3):3649–3688

    Article  Google Scholar 

  46. Narang A, Gupta D, Kaur A (2020) Biometrics-based un-locker to enhance cloud security systems. In: International journal of cloud applications and computing (IJCAC), 2020

  47. Al-Ayyoub M, AlZu’bi S, Jararweh Y, Shehab MA, Gupta BB (2016) Accelerating 3D medical volume segmentation using GPUs. In: Multimedia tools & applications, 2016

  48. Bansal R, Singh VK (2020) Proposed technique for efficient cloud computing model in effective digital training towards sustainable livelihoods for unemployed youths. In: International journal of cloud applications and computing (IJCAC), 2020, vol 10

  49. Al-Ayyoub M, Al-Andoli M, Jararweh Y, Smadi S, Gupta B (2018) Improving fuzzy C-mean-based community detection in social networks using dynamic parallelism. In: Computers & electrical engineering, 2018

  50. Chen LP, Lliu Z (2018) Broad learning system: an effective and efficient incremental learning system without the need for deep architecture. IEEE Trans Neural Netw Learn Syst 29(1):10–24

    Article  MathSciNet  Google Scholar 

  51. Chen J, Liu H, Xu X, Sun F (2019) Multimodal information fusion based on width learning method. J Intell Syst 14(01):150–157

    Google Scholar 

  52. Zheng Y, Chen B (2019) Width learning system based on minimum p-paradigm. Pattern Recog Artif Intell 32(01):51–57

    Google Scholar 

  53. Pan R (PERRY FORDSON) (2018) Research on emotion recognition and feature learning method based on multimodal human data. South China University of Technology, 2018

  54. Li H, Zhou Z (2019) Real-time car model recognition algorithm for hierarchical width model. Data Acquis Process 34(01):80–90

    Google Scholar 

Download references

Funding

This research is funded for High Level Innovation Teams and Distinguished Scholars Program of Guangxi Higher Education Institutions, Gui jiao ren [2018] No. 35.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization by Zhiyou Yang, Data curation by Ying Huang, Methodology by Ying Huang, Writing original draft by Ying Huang, Software by Ying Huang, Supervision by Zhiyou Yang. All authors have both read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Ying Huang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Huang, Y. Algorithm for speech emotion recognition classification based on Mel-frequency Cepstral coefficients and broad learning system. Evol. Intel. 15, 2485–2494 (2022). https://doi.org/10.1007/s12065-020-00532-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00532-3

Keywords

Navigation