Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

Karan, Biswajit; Kumar, Arvind

doi:10.1007/s00034-023-02544-7

Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

Published: 06 December 2023

Volume 43, pages 2224–2250, (2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

177 Accesses
1 Altmetric
Explore all metrics

Abstract

This work investigates the significance of Hilbert domain characterization of wavelet packets in classifying different emotion of speech signal. The goal of this paper is to create a new emotional speech database and introduce a new feature extraction approach that can recognize various emotions. The proposed feature, wavelet cepstral coefficients (WCC) are based on Hilbert spectrum analysis of the wavelet packet of the speech signal. The speaker-independent machine learning models are developed using multiclass support vector machine (SVM) and k-nearest neighbourhood (KNN) classifier. The approach is tested with newly developed Telugu Indian database and the EMOVO (Italian emotional speech) database. Our proposed wavelet features achieve a peak accuracy of 73.5%, further boosted by NCA feature selection by 3–5%, resulting in an improved unweighted average recall (UAR) of 78% for database 1 and 87.50% for database 2, employing optimal wavelet features in conjunction with SVM classification. The proposed features outperformed the baseline Mel-frequency cepstral coefficients (MFCC) feature. The performance of newly formulated features is better than other existing methodologies tested with different language databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review of Discrete Wavelet Transform-Based Emotion Recognition from Speech

Speech Emotion Recognition Based on Wavelet Packet Coefficients

Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition

Article 21 February 2017

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

B.J. Abbaschian, D. Sierra-Sosa, A. Elmaghraby, Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4), 1249 (2021)
Article PubMed PubMed Central ADS Google Scholar
J. Ancilin, A. Milton, Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Article Google Scholar
G. Assunção, P. Menezes, F. Perdigão, Speaker awareness for speech emotion recognition. Int. J. Online Biomed. Eng. 16(4), 15–22 (2020). https://doi.org/10.3991/ijoe.v16i04.11870
Article Google Scholar
X. Cai, D. Dai, Z. Wu, X. Li, J. Li & H. Meng (2021) Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5734–5738). IEEE. https://doi.org/10.1109/ICASSP39728.2021.9413907
S. Casale, A. Russo, G. Scebba & S. Serrano (2008) Speech emotion classification using machine learning algorithms. In: 2008 IEEE international conference on semantic computing (pp. 158–165). IEEE. https://doi.org/10.1109/ICSC.2008.43
Y. Chavhan, M.L. Dhore, P. Yesaware, Speech emotion recognition using support vector machine. Int. J. Comput. Appl. 1(20), 6–9 (2010)
Google Scholar
X. Cheng, & Q. Duan (2012) Speech emotion recognition using gaussian mixture model. In: 2012 international conference on computer application and system modeling (pp. 1222–1225). Atlantis Press. https://doi.org/10.2991/iccasm.2012.311
G. Costantini, I. Iaderola, A. Paoloni, & M. Todisco (2014) EMOVO corpus: an Italian emotional speech database. In: Proceedings of the ninth international conference on language resources and evaluation (LREC'14) (pp. 3501–3504). European Language Resources Association (ELRA)
S.K. Dmello, S.D. Craig, A. Witherspoon, B. McDaniel, A. Graesser, Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18, 45–80 (2008)
Article Google Scholar
S. Deb, S. Dandapat, Multiscale amplitude feature and significance of enhanced vocal tract information for emotion classification. IEEE Trans. Cybernet. 49(3), 802–815 (2018)
Article Google Scholar
V.N. Degaonkar, S.D. Apte, Emotion modeling from speech signal based on wavelet packet transform. Int. J. Speech Technol. 16(1), 1–5 (2013)
Article Google Scholar
J. Deng, X. Xu, Z. Zhang, S. Frühholz, B. Schuller, Exploitation of phase-based features for whispered speech emotion recognition. IEEE Access 4, 4299–4309 (2016)
Article Google Scholar
A. Ganapathy, Speech emotion recognition using deep learning techniques. ABC J. Adv. Res. 5(2), 113–122 (2016)
Article Google Scholar
P. Gangamohan, S.R. Kadiri, B. Yegnanarayana, Analysis of emotional speech at subsegmental level. Interspeech 2013, 1916–1920 (2013)
Google Scholar
J. Goldberger, G.E. Hinton, S. Roweis, R.R. Salakhutdinov, Neighbourhood components analysis. Adv. Neural Inform. Process. Syst. (2004). http://api.semanticscholar.org/CorpusID:8616518
T.S. Gunawan, M.F. Alghifari, M.A. Morshidi, M. Kartiwi, A review on emotion recognition algorithms using speech analysis Indonesian. J. Electric. Eng. Inform. (IJEEI) 6(1), 12–20 (2018)
Google Scholar
L. Guo, L. Wang, J. Dang, E.S. Chng, S. Nakagawa, Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Commun. 136, 118–127 (2022)
Article Google Scholar
F. Haider, S. Pollak, P. Albert, S. Luz, Emotion recognition in low-resource settings: an evaluation of automatic feature selection methods. Comput. Speech Language 65, 101119 (2021)
Article Google Scholar
K. Han, D. Yu, I. Tashev, Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014. (2014). https://doi.org/10.21437/Interspeech.2014-57
T. Han, J. Zhang, Z. Zhang, G. Sun, L. Ye, H. Ferdinando, S. Yang, Emotion recognition and school violence detection from children speech. EURASIP J. Wirel. Commun. Netw. 2018(1), 1–10 (2018)
Article Google Scholar
Z. M. Hira, D. F. Gillies (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015
N.H. Ho, H.J. Yang, S.H. Kim, G. Lee, Multimodal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network. IEEE Access 8, 61672–61686 (2020)
Article Google Scholar
D. Issa, M.F. Demirci, A. Yazici, Speech emotion recognition with deep convolutional neural networks. Biomed. Signal Process. Control 59, 101894 (2020)
Article Google Scholar
R. Jahangir, Y.W. Teh, F. Hanif, G. Mujtaba, Deep learning approaches for speech emotion recognition: state of the art and research challenges. Multimed. Tools Appl. 80, 1–68 (2021)
Google Scholar
C. Joesph, A. Rajeswari, B. Premalatha, & C. Balapriya (2020) Implementation of physiological signal based emotion recognition algorithm. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp. 2075–2079). IEEE
S.R. Kadiri, P. Gangamohan, S.V. Gangashetty, P. Alku, B. Yegnanarayana, Excitation features of speech for emotion recognition using neutral speech as reference. Circ. Syst. Signal Process. 39(9), 4459–4481 (2020)
Article Google Scholar
B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybernet. Biomed. Eng. 42(2), 710–726 (2022)
Article Google Scholar
B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Hilbert spectrum analysis for automatic detection and evaluation of Parkinson’s speech. Biomed. Signal Process. Control 61, 102050 (2020)
Article Google Scholar
B. Karan, S.S. Sahu, J.R. Orozco-Arroyave, K. Mahto, Non-negative matrix factorization-based time-frequency feature extraction of voice signal for Parkinson’s disease prediction. Comput. Speech Lang. 69, 101216 (2021)
Article Google Scholar
L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. A. Mahjoub & C. Cleder (2019). Automatic speech emotion recognition using machine learning
S.G. Koolagudi, S.R. Krothapalli, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012)
Article Google Scholar
P.T. Krishnan, A.N. Joseph Raj, V. Rajangam, Emotion classification from speech signal based on empirical mode decomposition and non-linear features: speech emotion recognition. Complex Intell. Syst. 7, 1919–1934 (2021)
Article Google Scholar
T. Kumar, S.S. Rajest, K.O. Villalba-Condori, D. Arias-Chavez, K. Rajesh, M.K. Chakravarthi, An evaluation on speech recognition technology based on machine learning. Webology 19(1), 646–663 (2022)
Article Google Scholar
S. Kwon, A CNN-assisted enhanced audio signal processing for speech emotion recognition. Sensors 20(1), 183 (2019)
Article PubMed PubMed Central ADS Google Scholar
S. Kwon, Optimal feature selection based speech emotion recognition using two-stream deep convolutional neural network. Int. J. Intell. Syst. 36(9), 5116–5135 (2021)
Article Google Scholar
Z.T. Liu, M. Wu, W.H. Cao, J.W. Mao, J.P. Xu, G.Z. Tan, Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing 273, 271–280 (2018)
Article Google Scholar
S. Lalitha, S. Tripathi, D. Gupta, Enhanced speech emotion detection using deep neural networks. Int. J. Speech Technol. 22, 497–510 (2019)
Article Google Scholar
S. Latif, R. Rana, S. Younis, J. Qadir, & J. Epps (2018) Transfer learning for improving speech emotion classification accuracy. arXiv preprint arXiv:1801.06353
C.C. Lee, E. Mower, C. Busso, S. Lee, S. Narayanan, Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9–10), 1162–1171 (2011)
Article Google Scholar
E. Lieskovská, M. Jakubec, R. Jarina, M. Chmulík, A review on speech emotion recognition using deep learning and attention mechanism. Electronics 10(10), 1163 (2021)
Article Google Scholar
D. Litman, & K. Forbes-Riley (2004) Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04) (pp. 351–358)
M. Liu, English speech emotion recognition method based on speech recognition. Int. J. Speech Technol. 25(2), 391–398 (2022)
Article Google Scholar
W. Liu, W. L. Zheng, & B. L. Lu (2016) Emotion recognition using multimodal deep learning. In Neural information processing: 23rd international conference, ICONIP 2016, Kyoto, Japan, Oct 16–21, 2016, Proceedings, Part II 23 (pp. 521–529). Springer International Publishing
T. L. Nwe, F. S. Wei, & L. C. De Silva (2001) Speech based emotion classification. In Proceedings of IEEE region 10 international conference on electrical and electronic technology. TENCON 2001 (Cat. No. 01CH37239) (Vol. 1, pp. 297–301). IEEE
S. Olhede, & A. T. Walden (2004) The Hilbert spectrum via wavelet projections.In: Proceedings of the royal society of London. Series A: mathematical, physical and engineering sciences, 460(2044), 955–975
T. Özseven, A novel feature selection method for speech emotion recognition. Appl. Acoust. 146, 320–326 (2019)
Article Google Scholar
R. Pappagari, J. Villalba, P. Żelasko, L. Moro-Velazquez & N. Dehak (2021). Copypaste: an augmentation method for speech emotion recognition. In ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6324–6328). IEEE. [43]
Percival, B. Donald, and T. Andrew Walden. Wavelet methods for time series analysis. Vol. 4. Cambridge university press, 2000
S. Ramakrishnan, I.M. El Emary, Speech emotion recognition approaches in human computer interaction. Telecommun. Syst. 52, 1467–1478 (2013)
Article Google Scholar
S. Ramesh, S. Gomathi, S. Sasikala, T.R. Saravanan, Automatic speech emotion detection using hybrid of gray wolf optimizer and naïve Bayes. Int. J. Speech Technol. 12, 1–8 (2021)
Google Scholar
K.S. Rao, S.G. Koolagudi, R.R. Vempada, Emotion recognition from speech using global and local prosodic features. Int. J. Speech Technol. 16, 143–160 (2013)
Article Google Scholar
A. Shahzadi, A. Ahmadyfard, A. Harimi, K. Yaghmaie, Speech emotion recognition using nonlinear dynamics features. Turk. J. Electric. Eng. Comput. Sci. 23, 2056 (2015)
Article Google Scholar
P. Shen, Z. Changjun, & X. Chen (2011) Automatic speech emotion recognition using support vector machine. In Proceedings of 2011 international conference on electronic & mechanical engineering and information technology (Vol. 2, pp. 621–625). IEEE
M. Sidorov, S. Ultes, & A. Schmitt (2014) Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). IEEE
M. Swain, B. Maji, P. Kabisatpathy, A. Routray, A DCRNN-based ensemble classifier for speech emotion recognition in Odia language. Complex Intell. Syst. 8(5), 4237–4249 (2022)
Article Google Scholar
D. Tang, P. Kuppens, L. Geurts, T. van Waterschoot, End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network. EURASIP J. Audio Speech Music Process. 2021(1), 18 (2021)
Article PubMed PubMed Central Google Scholar
J.H. Tao, J. Huang, Y. Li, Z. Lian, M.Y. Niu, Semi-supervised ladder networks for speech emotion recognition. Int. J. Autom. Comput. 16, 437–448 (2019)
Article Google Scholar
K. Tarunika, R. B. Pradeeba, & P. Aruna (2018). Applying machine learning techniques for speech emotion recognition. In 2018 9th international conference on computing, communication and networking technologies (ICCCNT) (pp. 1–5). IEEE
W. Ting, Y. Guo-Zheng, Y. Bang-Hua, S. Hong, EEG feature extraction based on wavelet packet decomposition for brain computer interface. Measurement 41(6), 618–625 (2008)
Article ADS Google Scholar
S. Tripathi, A. Kumar, A. Ramesh, C. Singh, & P. Yenigalla (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv preprint arXiv:1906.05681
T. Tuncer, S. Dogan, U.R. Acharya, Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques. Knowl. Based Syst. 211, 106547 (2021)
Article Google Scholar
M.Z. Uddin, E.G. Nilsson, Emotion recognition using speech and neural structured learning to facilitate edge intelligence. Eng. Appl. Artif. Intell. 94, 103775 (2020)
Article Google Scholar
T. Vogt, E. André, An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz 25, 213–223 (2011)
Article Google Scholar
Y. Wang, L. Guan, Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10(5), 936–946 (2008)
Article Google Scholar
P. Yadav, G. Aggarwal, Speech emotion classification using machine learning. Int. J. Comput. Appl. 118(13), 44 (2015)
Google Scholar
Z. Zhang, E. Coutinho, J. Deng, B. Schuller, Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Trans. Audio Speech Language Process. 23(1), 115–126 (2014)
Google Scholar
J. Zhao, X. Mao, L. Chen, Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 47, 312–323 (2019)
Article Google Scholar
C. Zheng, C. Wang, N. Jia, A two-channel speech emotion recognition model based on raw stacked waveform. Multimed. Tools Appl. 81(8), 11537–11562 (2022)
Article Google Scholar

Download references

Acknowledgements

The authors like to thank all participants of Aditya engineering college, Andhra Pradesh, India, for recording of speech samples. We also acknowledged the Department of Electrical and Electronics, University of Stellenbosch, for conducting experiments in the laboratory.

Funding

No.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Aditya Engineering College(A), Surampalem, Andhrapradesh, India
Biswajit Karan
Department of Electrical and Electronic Engineering, University of Stellenbosch, Stellenbosch, South Africa
Biswajit Karan
Department of Electronics, Electrical and Communication Engineering, Gitam University, Bengaluru, India
Arvind Kumar

Authors

Biswajit Karan
View author publications
You can also search for this author in PubMed Google Scholar
Arvind Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biswajit Karan.

Ethics declarations

Competing Interests

The authors affirm that they have no known financial or interpersonal Conflicts that would have appeared to have an impact on the research presented in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Karan, B., Kumar, A. Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification. Circuits Syst Signal Process 43, 2224–2250 (2024). https://doi.org/10.1007/s00034-023-02544-7

Download citation

Received: 08 January 2023
Revised: 17 October 2023
Accepted: 18 October 2023
Published: 06 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00034-023-02544-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

Abstract

Access this article

Similar content being viewed by others

Review of Discrete Wavelet Transform-Based Emotion Recognition from Speech

Speech Emotion Recognition Based on Wavelet Packet Coefficients

Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hilbert Domain Analysis of Wavelet Packets for Emotional Speech Classification

Abstract

Access this article

Similar content being viewed by others

Review of Discrete Wavelet Transform-Based Emotion Recognition from Speech

Speech Emotion Recognition Based on Wavelet Packet Coefficients

Novel Sub-band Spectral Centroid Weighted Wavelet Packet Features with Importance-Weighted Support Vector Machines for Robust Speech Emotion Recognition

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation