Skip to main content
Log in

Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This work investigates the significance of Hilbert domain characterization of wavelet packets in classifying different emotion of speech signal. The goal of this paper is to create a new emotional speech database and introduce a new feature extraction approach that can recognize various emotions. The proposed feature, wavelet cepstral coefficients (WCC) are based on Hilbert spectrum analysis of the wavelet packet of the speech signal. The speaker-independent machine learning models are developed using multiclass support vector machine (SVM) and k-nearest neighbourhood (KNN) classifier. The approach is tested with newly developed Telugu Indian database and the EMOVO (Italian emotional speech) database. Our proposed wavelet features achieve a peak accuracy of 73.5%, further boosted by NCA feature selection by 3–5%, resulting in an improved unweighted average recall (UAR) of 78% for database 1 and 87.50% for database 2, employing optimal wavelet features in conjunction with SVM classification. The proposed features outperformed the baseline Mel-frequency cepstral coefficients (MFCC) feature. The performance of newly formulated features is better than other existing methodologies tested with different language databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. B.J. Abbaschian, D. Sierra-Sosa, A. Elmaghraby, Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4), 1249 (2021)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  2. J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)

    Article  Google Scholar 

  3. G. Assunção, P. Menezes, F. Perdigão, Speaker awareness for speech emotion recognition. Int. J. Online Biomed. Eng. 16(4), 15–22 (2020). https://doi.org/10.3991/ijoe.v16i04.11870

    Article  Google Scholar 

  4. X. Cai, D. Dai, Z. Wu, X. Li, J. Li & H. Meng (2021) Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5734–5738). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413907

  5. S. Casale, A. Russo, G. Scebba & S. Serrano (2008) Speech emotion classification using machine learning algorithms. In: 2008 IEEE international conference on semantic computing (pp. 158–165). IEEE. https://doi.org/10.1109/ICSC.2008.43

  6. Y. Chavhan, M.L. Dhore, P. Yesaware, Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010)

    Google Scholar 

  7. X. Cheng, & Q. Duan (2012) Speech emotion recognition using gaussian mixture model. In: 2012 international conference on computer application and system modeling (pp. 1222–1225). Atlantis Press. https://doi.org/10.2991/iccasm.2012.311

  8. G. Costantini, I. Iaderola, A. Paoloni, & M. Todisco (2014) EMOVO corpus: an Italian emotional speech database. In: Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 3501–3504). European Language Resources Association (ELRA)

  9. S.K. Dmello, S.D. Craig, A. Witherspoon, B. McDaniel, A. Graesser, Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18, 45–80 (2008)

    Article  Google Scholar 

  10. S. Deb, S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification. IEEE Trans. Cybernet. 49(3), 802–815 (2018)

    Article  Google Scholar 

  11. V.N. Degaonkar, S.D. Apte, Emotion modeling from speech signal based on wavelet packet transform. Int. J. Speech Technol. 16(1), 1–5 (2013)

    Article  Google Scholar 

  12. J. Deng, X. Xu, Z. Zhang, S. Frühholz, B. Schuller, Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access 4, 4299–4309 (2016)

    Article  Google Scholar 

  13. A. Ganapathy, Speech emotion recognition using deep learning techniques. ABC J. Adv. Res. 5(2), 113–122 (2016)

    Article  Google Scholar 

  14. P. Gangamohan, S.R. Kadiri, B. Yegnanarayana, Analysis of emotional speech at subsegmental level. Interspeech 2013, 1916–1920 (2013)

    Google Scholar 

  15. J. Goldberger, G.E. Hinton, S. Roweis, R.R. Salakhutdinov, Neighbourhood components analysis. Adv. Neural Inform. Process. Syst. (2004). http://api.semanticscholar.org/CorpusID:8616518

  16. T.S. Gunawan, M.F. Alghifari, M.A. Morshidi, M. Kartiwi, A review on emotion recognition algorithms using speech analysis Indonesian. J. Electric. Eng. Inform. (IJEEI) 6(1), 12–20 (2018)

    Google Scholar 

  17. L. Guo, L. Wang, J. Dang, E.S. Chng, S. Nakagawa, Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Commun. 136, 118–127 (2022)

    Article  Google Scholar 

  18. F. Haider, S. Pollak, P. Albert, S. Luz, Emotion recognition in low-resource settings: an evaluation of automatic feature selection methods. Comput. Speech Language 65, 101119 (2021)

    Article  Google Scholar 

  19. K. Han, D. Yu, I. Tashev, Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014. (2014). https://doi.org/10.21437/Interspeech.2014-57

  20. T. Han, J. Zhang, Z. Zhang, G. Sun, L. Ye, H. Ferdinando, S. Yang, Emotion recognition and school violence detection from children speech. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–10 (2018)

    Article  Google Scholar 

  21. Z. M. Hira, D. F. Gillies (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015

  22. N.H. Ho, H.J. Yang, S.H. Kim, G. Lee, Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network. IEEE Access 8, 61672–61686 (2020)

    Article  Google Scholar 

  23. D. Issa, M.F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)

    Article  Google Scholar 

  24. R. Jahangir, Y.W. Teh, F. Hanif, G. Mujtaba, Deep learning approaches for speech emotion recognition: state of the art and research challenges. Multimed. Tools Appl. 80, 1–68 (2021)

    Google Scholar 

  25. C. Joesph, A. Rajeswari, B. Premalatha, & C. Balapriya (2020) Implementation of physiological signal based emotion recognition algorithm. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp. 2075–2079). IEEE

  26. S.R. Kadiri, P. Gangamohan, S.V. Gangashetty, P. Alku, B. Yegnanarayana, Excitation features of speech for emotion recognition using neutral speech as reference. Circ. Syst. Signal Process. 39(9), 4459–4481 (2020)

    Article  Google Scholar 

  27. B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybernet. Biomed. Eng. 42(2), 710–726 (2022)

    Article  Google Scholar 

  28. B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Hilbert spectrum analysis for automatic detection and evaluation of Parkinson’s speech. Biomed. Signal Process. Control 61, 102050 (2020)

    Article  Google Scholar 

  29. B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction. Comput. Speech Lang. 69, 101216 (2021)

    Article  Google Scholar 

  30. L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub & C. Cleder (2019). Automatic speech emotion recognition using machine learning

  31. S.G. Koolagudi, S.R. Krothapalli, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012)

    Article  Google Scholar 

  32. P.T. Krishnan, A.N. Joseph Raj, V. Rajangam, Emotion classification from speech signal based on empirical mode decomposition and non-linear features: speech emotion recognition. Complex Intell. Syst. 7, 1919–1934 (2021)

    Article  Google Scholar 

  33. T. Kumar, S.S. Rajest, K.O. Villalba-Condori, D. Arias-Chavez, K. Rajesh, M.K. Chakravarthi, An evaluation on speech recognition technology based on machine learning. Webology 19(1), 646–663 (2022)

    Article  Google Scholar 

  34. S. Kwon, A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2019)

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  35. S. Kwon, Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)

    Article  Google Scholar 

  36. Z.T. Liu, M. Wu, W.H. Cao, J.W. Mao, J.P. Xu, G.Z. Tan, Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)

    Article  Google Scholar 

  37. S. Lalitha, S. Tripathi, D. Gupta, Enhanced speech emotion detection using deep neural networks. Int. J. Speech Technol. 22, 497–510 (2019)

    Article  Google Scholar 

  38. S. Latif, R. Rana, S. Younis, J. Qadir, & J. Epps (2018) Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353

  39. C.C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9–10), 1162–1171 (2011)

    Article  Google Scholar 

  40. E. Lieskovská, M. Jakubec, R. Jarina, M. Chmulík, A review on speech emotion recognition using deep learning and attention mechanism. Electronics 10(10), 1163 (2021)

    Article  Google Scholar 

  41. D. Litman, & K. Forbes-Riley (2004) Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04) (pp. 351–358)

  42. M. Liu, English speech emotion recognition method based on speech recognition. Int. J. Speech Technol. 25(2), 391–398 (2022)

    Article  Google Scholar 

  43. W. Liu, W. L. Zheng, & B. L. Lu (2016) Emotion recognition using multimodal deep learning. In Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, Oct 16–21, 2016, Proceedings, Part II 23 (pp. 521–529). Springer International Publishing

  44. T. L. Nwe, F. S. Wei, & L. C. De Silva (2001) Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01CH37239) (Vol. 1, pp. 297–301). IEEE

  45. S. Olhede, & A. T. Walden (2004) The Hilbert spectrum via wavelet projections.In: Proceedings of the royal society of London. Series A: mathematical, physical and engineering sciences460(2044), 955–975

  46. T. Özseven, A novel feature selection method for speech emotion recognition. Appl. Acoust. 146, 320–326 (2019)

    Article  Google Scholar 

  47. R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez & N. Dehak (2021). Copypaste: an augmentation method for speech emotion recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6324–6328). IEEE. [43]

  48. Percival, B. Donald, and T. Andrew Walden. Wavelet methods for time series analysis. Vol. 4. Cambridge university press, 2000

  49. S. Ramakrishnan, I.M. El Emary, Speech emotion recognition approaches in human computer interaction. Telecommun. Syst. 52, 1467–1478 (2013)

    Article  Google Scholar 

  50. S. Ramesh, S. Gomathi, S. Sasikala, T.R. Saravanan, Automatic speech emotion detection using hybrid of gray wolf optimizer and naïve Bayes. Int. J. Speech Technol. 12, 1–8 (2021)

    Google Scholar 

  51. K.S. Rao, S.G. Koolagudi, R.R. Vempada, Emotion recognition from speech using global and local prosodic features. Int. J. Speech Technol. 16, 143–160 (2013)

    Article  Google Scholar 

  52. A. Shahzadi, A. Ahmadyfard, A. Harimi, K. Yaghmaie, Speech emotion recognition using nonlinear dynamics features. Turk. J. Electric. Eng. Comput. Sci. 23, 2056 (2015)

    Article  Google Scholar 

  53. P. Shen, Z. Changjun, & X. Chen (2011) Automatic speech emotion recognition using support vector machine. In Proceedings of 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621–625). IEEE

  54. M. Sidorov, S. Ultes, & A. Schmitt (2014) Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). IEEE

  55. M. Swain, B. Maji, P. Kabisatpathy, A. Routray, A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 8(5), 4237–4249 (2022)

    Article  Google Scholar 

  56. D. Tang, P. Kuppens, L. Geurts, T. van Waterschoot, End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network. EURASIP J. Audio Speech Music Process. 2021(1), 18 (2021)

    Article  PubMed  PubMed Central  Google Scholar 

  57. J.H. Tao, J. Huang, Y. Li, Z. Lian, M.Y. Niu, Semi-supervised ladder networks for speech emotion recognition. Int. J. Autom. Comput. 16, 437–448 (2019)

    Article  Google Scholar 

  58. K. Tarunika, R. B. Pradeeba, & P. Aruna (2018). Applying machine learning techniques for speech emotion recognition. In 2018 9th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE

  59. W. Ting, Y. Guo-Zheng, Y. Bang-Hua, S. Hong, EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6), 618–625 (2008)

    Article  ADS  Google Scholar 

  60. S. Tripathi, A. Kumar, A. Ramesh, C. Singh, & P. Yenigalla (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv preprint arXiv:1906.05681

  61. T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Based Syst. 211, 106547 (2021)

    Article  Google Scholar 

  62. M.Z. Uddin, E.G. Nilsson, Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng. Appl. Artif. Intell. 94, 103775 (2020)

    Article  Google Scholar 

  63. T. Vogt, E. André, An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz 25, 213–223 (2011)

    Article  Google Scholar 

  64. Y. Wang, L. Guan, Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008)

    Article  Google Scholar 

  65. P. Yadav, G. Aggarwal, Speech emotion classification using machine learning. Int. J. Comput. Appl. 118(13), 44 (2015)

    Google Scholar 

  66. Z. Zhang, E. Coutinho, J. Deng, B. Schuller, Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Language Process. 23(1), 115–126 (2014)

    Google Scholar 

  67. J. Zhao, X. Mao, L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)

    Article  Google Scholar 

  68. C. Zheng, C. Wang, N. Jia, A two-channel speech emotion recognition model based on raw stacked waveform. Multimed. Tools Appl. 81(8), 11537–11562 (2022)

    Article  Google Scholar 

Download references

Acknowledgements

The authors like to thank all participants of Aditya engineering college, Andhra Pradesh, India, for recording of speech samples. We also acknowledged the Department of Electrical and Electronics, University of Stellenbosch, for conducting experiments in the laboratory.

Funding

No.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biswajit Karan.

Ethics declarations

Competing Interests

The authors affirm that they have no known financial or interpersonal Conflicts that would have appeared to have an impact on the research presented in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karan, B., Kumar, A. Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification. Circuits Syst Signal Process 43, 2224–2250 (2024). https://doi.org/10.1007/s00034-023-02544-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02544-7

Keywords

Navigation