Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

Kumar, Arvind; Chandra, Mahesh

doi:10.1007/s11042-022-13267-3

Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

Published: 03 June 2022

Volume 82, pages 33–58, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

246 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This work aims to investigate the significance of different Empirical Mode Decomposition (EMD) based statistical features for discrimination of speech and low frequency music signal (guitar signals) which mostly lie in the frequency range of 80–1200 Hz. Each of the speech/guitar audio samples is decomposed into 10 Intrinsic Function Mode (IMFs). These IMFs are further analyzed for discriminatory evidence using statistical features like Mean, Absolute Mean, Kurtosis, Variance and Skewness. These features are then fed to different classifiers and their performances were tabulated for varying tuning parameters of the classifiers. Initial experiments were conducted on isolated features to shortlist features with best discriminatory evidence. These shortlisted features were then used in different combinations and their performances were reported. An improvement of 19.13% is observed for hybrid features over isolated features. Speech samples were obtained from Scheirer and Slaney database and Guitar samples were generated from a continuous guitar monologue uploaded on YouTube. Feature selection technique using Fisher Method and F-ratio were also implemented and best feature vectors were reported for both the algorithm. Best overall accuracy of 82.16% is reported for Hybrid features with Radial Basis Function (RBF) kernel of SVM classifier when trained with top 38 feature vectors obtained using F-Ratio Method. Different experiments verified Absolute Mean and Variance as best performing features for our task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancement and Comparative Analysis of Environmental Sound Classification Using MFCC and Empirical Mode Decomposition

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Clean speech/speech with background music classification using HNGD spectrum

Article 16 October 2017

References

Alexandre-Cortizo E, Rosa-Zurera M, Lopez-Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. EUROCON 2005 - The International Conference on "Computer as a Tool", pp 1666–1669. https://doi.org/10.1109/EURCON.2005.1630291
Babiker A, Faye I, Mumtaz W, Malik AS, Sato H (2018) EEG in classroom: EMD features to detect situational interest of students during learning. Multimedia Tools and Applications, pp:1–21
Birajdar GK, Patil MD, (2018) Speech and music classification using spectrogram based statistical descriptors and extreme learning machine. Multimedia tools and applications, pp.1-28.
Bouzid A, Ellouze N (2004) “Empirical mode decomposition of voiced speech signal,” in Control, Communications and Signal Processing, 2004. First International Symposium on. IEEE, pp. 603–606
Bykhovsky D, Hadar O (2010) Evaluation of a GLRT threshold for voiced-unvoiced decision and pitch tracking in noisy speech. 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, pp 000680–000683. https://doi.org/10.1109/EEEI.2010.5662126
Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter bank. Signal Processing Letters, IEEE 11(2):112–114
Article Google Scholar
Gu Q, Li Z, Han J, (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725. https://doi.org/10.48550/arXiv.1202.3725
Huang NE, (2014) Hilbert-Huang transform and its applications (Vol. 16). World scientific
Huang H, Pan J (2006) Speech pitch determination based on Hilbert Huang transform. Signal Process 86(4):792–803
Article MATH Google Scholar
Huang NE, Shen SS (2005) Hilbert-Huang transform and its applications. World Scientific 5
Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. In proceedings of the Royal Society of London a: mathematical, physical and engineering sciences. R Soc 454(1971):903–995
Khonglah BK, Prasanna SM (2016) Speech/music classification using speech-specific features. Digital Signal Processing 48:71–83
Article MathSciNet Google Scholar
Khonglah BK, Sharma R, Mahadeva Prasanna SR, (2015) Speech vs music discrimination using empirical mode decomposition. 2015 Twenty First National Conference on Communications (NCC), pp 1–6. https://doi.org/10.1109/NCC.2015.7084865
Kim SK, Chang JH (2009) Speech/music classification enhancement for 3GPP2 SMV codec based on support vector machine. IEICE Trans Fundam Electron Commun Comput Sci 92(2):630–632. https://doi.org/10.1587/transfun.E92.A.630
Lahmiri S, Gargour C, Gabrea M, (2012) Statistical features selection from intrinsic mode functions for pathologies detection in retina digital images. IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society, pp 1585–1590. https://doi.org/10.1109/IECON.2012.6388532
Lim C, Chang JH (2015) Efficient implementation techniques of an svm-based speech/music classifier in SMV. Multimed Tools Appl 74(15):5375–5400
Article Google Scholar
Moreno PJ, Rifkin R, (2000) Using the fisher kernel method for web audio classification. In acoustics, speech, and signal processing, 2000. ICASSP'00. Proceedings. 2000 IEEE international conference on (Vol. 4, pp. 2417-2420). IEEE
Panagiotakis C, Tziritas G (2002) A speech/music discriminator based on RMS and zero-crossings. 2002 11th European Signal Processing Conference, pp 1-4
Pantazis Y, Rosec O, Stylianou Y (2011) Adaptive AM–FM signal decomposition with application to speech analysis. IEEE Trans Audio Speech Lang Process 19(2):290–300
Article Google Scholar
Papakostas M, Giannakopoulos T (2018) Speech-music discrimination using deep visual feature extractors. Expert Syst Appl 114:334–344
Article Google Scholar
Roffo G, Melzi S, (2017) Ranking to learn: feature ranking and selection via eigenvector centrality. In new Frontiers in mining complex patterns: 5th international workshop, NFMCP 2016, held in conjunction with ECML-PKDD 2016, Riva del Garda, Italy, September 19, 2016, revised selected papers (Vol. 10312, p. 19). Springer
Ruiz-Reyes N, Vera-Candeas P, Muñoz JE, García-Galán S, Cañadas FJ (2009) New speech/music discrimination approach based on fundamental frequency estimation. Multimed Tools Appl 41(2):253–286
Article Google Scholar
Sahoo JP, Ari S, Ghosh DK (2018) Hand gesture recognition using DWT and F-ratio based feature descriptor. IET Image Process 12(10):1780–1787
Article Google Scholar
Saunders J, (1996) Real-time discrimination of broadcast speech/music. In ICASSP (pp. 993-996). IEEE
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multi-feature speech/music discriminator. In Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on. IEEE 2:1331–1334
Seck M, Bimbot F, Zugaj D, Delyon B, (1999) Two-class signal segmentation for speech/music detection in audio tracks. In Sixth European Conference on Speech Communication and Technology
Sharma R, Prasanna SM, (2015) Characterizing glottal activity from speech using empirical mode decomposition. In communications (NCC), 2015 twenty first National Conference on (pp. 1-6). IEEE
Shirazi J, Ghaemmaghami S (2010) Improvement to speech-music discrimination using sinusoidal model based features. Multimed Tools Appl 50(2):415–435. https://doi.org/10.1007/s11042-009-0416-3
Article Google Scholar
Tsipas N, Vrysis L, Dimoulas C, Papanikolaou G (2017) Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination. Multimed Tools Appl 76(24):25603–25621. https://doi.org/10.1007/s11042-016-4315-0
Article Google Scholar
Wang G, Chen XY, Qiao FL, Wu Z, Huang NE (2010) On intrinsic mode function. Adv Adapt Data Anal 2(03):277–293
Article MathSciNet Google Scholar
Williams G, Ellis DP, (1999) Speech/music discrimination based on posterior probability features. Eurospeech 99: 6th European Conference on Speech Communication and Technology: Budapest, Hungary, September 5–9. https://doi.org/10.7916/D8KH0XRH
Wu Z, Huang NE (2004) A study of the characteristics of white noise using the empirical mode decomposition method. Proceedings of the Royal Society of London Series A: Mathematical, Physical and Engineering Sciences 460(2046):1597–1611
Article MATH Google Scholar
Wu Z, Huang NE (2009) Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal 1(01):1–41. https://doi.org/10.1142/S1793536909000047
Article Google Scholar
YouTube. (2019). Relaxing Music from Sungha Jung (The Best of). [Online] Available at: https://www.youtube.com/watch?v=IP8vBL5Q8Ac&t=338s. Accessed 05 Jan 2021
Zhang T, Kuo CCJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on speech and audio processing 9(4):441–457
Article Google Scholar

Download references

Acknowledgements

We, all the authors, would like to thank Prof. Dr. Sandeep Singh Solanki, HOD and Professor, BIT Mesra, for his guidance and support in this research work. We would also like to extend my gratitude to Birla Insititute of Technology for providing us with the facilities to conduct our research.

Author information

Authors and Affiliations

Department of ECE, Birla Institute of Technology, Ranchi, India
Arvind Kumar
Department of ECE, Reva University, Bengaluru, India
Mahesh Chandra

Authors

Arvind Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arvind Kumar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A., Chandra, M. Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal. Multimed Tools Appl 82, 33–58 (2023). https://doi.org/10.1007/s11042-022-13267-3

Download citation

Received: 02 July 2019
Revised: 13 January 2021
Accepted: 19 May 2022
Published: 03 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13267-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

Abstract

Access this article

Similar content being viewed by others

Enhancement and Comparative Analysis of Environmental Sound Classification Using MFCC and Empirical Mode Decomposition

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Clean speech/speech with background music classification using HNGD spectrum

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal

Abstract

Access this article

Similar content being viewed by others

Enhancement and Comparative Analysis of Environmental Sound Classification Using MFCC and Empirical Mode Decomposition

A Review of Various Techniques Related to Feature Extraction and Classification for Speech Signal Analysis

Clean speech/speech with background music classification using HNGD spectrum

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation