Abstract
This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.
Similar content being viewed by others
References
Abu-E1-Quran AR, Goubran RA, Chan ADC (2006) Adaptive feature selection for speech/music classifications. In: IEEE International workshop on multimedia signal processing 212–216
Ajmera J, McCowan L, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. In: ELSEVIER Transactions on Speech communication 351–363
Babu J, Pathari V (2007) Multimedia content segmentation based on speaker recognition.In: IEEE ICSCN 2007, 16–19
Lin C-C, Chen S-H, Truong T-K, Chang Y (2005) Audio Classification and categorization based on wavelets and support vector machine. In: IEEE Transactions on Speech and Audio Processing 13: 644–651
Cortes C, Vapnik V (1995) Support vector networks. In: Mach. Learn 20: 273–297
Cortizo E, Zurer M, Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 1666–16669
Duda R, Hart P, Stock D (2000) Pattern Classification. Wiley
Ei-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: ICASSP 2000 2445–2448
Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. In: IEEE Transactions on Neural Networks 14: 209–215
Hsu C-W, Chang C-C, Lin C-J (2009)A practical guide to support vector classification. In Department of Computer Science, National Taiwan University, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jensen J, Hansen J.H.L (2001) Speech enhancement using a constrained iterative sinusoidal model. In: IEEE Transactions on Speech and Audio Processing 9: 731–740
Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. In: Journal of the Audio Engineering Society 55: 385–399
Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: IEEE Transactions on Speech and Audio Processing 8: 619–625
Li D, Sethi I.K, Dimitrova N, McGee T (2001) Classification of general audio for content- based retrieval. In: ELSEVIER Pattern Recognition Letters 533–554
Lu L, Zhang H-J (2002) Content analysis for audio classification and segmentation. In: IEEE Transactions on Speech and Audio Processing 10: 504–516
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector Machines.: In: Multimedia Systems Journal 482–492
McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Transactions on Acoustic, Speech and Signal Processing ASSP- 34 744–754
Moon TK (1996) The Expectation-maximization algorithm. In: IEEE Signal Processing Magazine 13: 47–60
Mowlaee Begzadeh Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In Proc. 3rd ICTTA’08 1–5
Nunes LO, Esquef PAA, Biscainho LWP, Merched R (2008) Partial tracking in sinusoidal modeling- an adaptive prediction-based RLS lattice solution. In: SIGMAP 2008 84–91
Rabiner LR, Shafer RW (1975) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs
Ramamohan S, Dandapat S (2006) Sinusoidal model-based analysis and classification of stressed speech. In: IEEE Transactions on Audio, Speech and Language Processing 14: 737–746
Regnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: Proceeding of ICASSP 2009 1685–1688
Sadjadi OS, Ahadi SM, Hazrati O (2007) Unsupervised speech/music classification using one-class support vector machines. In: 6th IEEE ICICS 1–5
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceeding of ICASSP 1996 993–996
Scheirer E, Slaney M (1997) Construction and evaluation of a robust multi-feature speech/music discriminator. In: Proceeding of ICASSP 1997 21–24
Smith JO, Serra X (1987) PARSHL: An analysis/synthesis program for non-harmonic sound based on sinusoidal representation. http://www-ccrma.stanford.edu/~jos/parshl/parshl.pdf
Somervuo P, Harma A, Fagerlund S (2006) Parametric representation of bird sounds for automatic species recognition. In: IEEE Transactions on Audio, Speech and Language Processing 14: 2252–2263
Tancerel L, Ragot S, Ruoppilaand VT, Lefebyre R (2000) Combined speech and audio coding by discrimination. In: IEEE work-shop on speech coding, 17–20
Thoshkahana B, Sudha V, Ramakrishnan KR (2006) A speech-music discriminator using HILN model based features. In: ICASSP 2006 425–428
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shirazi, J., Ghaemmaghami, S. Improvement to speech-music discrimination using sinusoidal model based features. Multimed Tools Appl 50, 415–435 (2010). https://doi.org/10.1007/s11042-009-0416-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0416-3