Skip to main content

Advertisement

Improvement to speech-music discrimination using sinusoidal model based features

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Abu-E1-Quran AR, Goubran RA, Chan ADC (2006) Adaptive feature selection for speech/music classifications. In: IEEE International workshop on multimedia signal processing 212–216

  2. Ajmera J, McCowan L, Bourlard H (2003) Speech/music segmentation using entropy and dynamism features in a HMM classification framework. In: ELSEVIER Transactions on Speech communication 351–363

  3. Babu J, Pathari V (2007) Multimedia content segmentation based on speaker recognition.In: IEEE ICSCN 2007, 16–19

  4. Lin C-C, Chen S-H, Truong T-K, Chang Y (2005) Audio Classification and categorization based on wavelets and support vector machine. In: IEEE Transactions on Speech and Audio Processing 13: 644–651

  5. Cortes C, Vapnik V (1995) Support vector networks. In: Mach. Learn 20: 273–297

  6. Cortizo E, Zurer M, Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 1666–16669

  7. Duda R, Hart P, Stock D (2000) Pattern Classification. Wiley

  8. Ei-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: ICASSP 2000 2445–2448

  9. Guo G, Li SZ (2003) Content-based audio classification and retrieval by support vector machines. In: IEEE Transactions on Neural Networks 14: 209–215

  10. Hsu C-W, Chang C-C, Lin C-J (2009)A practical guide to support vector classification. In Department of Computer Science, National Taiwan University, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  11. Jensen J, Hansen J.H.L (2001) Speech enhancement using a constrained iterative sinusoidal model. In: IEEE Transactions on Speech and Audio Processing 9: 731–740

  12. Lagrange M, Marchand S (2007) Estimating the instantaneous frequency of sinusoidal components using phase-based methods. In: Journal of the Audio Engineering Society 55: 385–399

  13. Li SZ (2000) Content-based audio classification and retrieval using the nearest feature line method. In: IEEE Transactions on Speech and Audio Processing 8: 619–625

  14. Li D, Sethi I.K, Dimitrova N, McGee T (2001) Classification of general audio for content- based retrieval. In: ELSEVIER Pattern Recognition Letters 533–554

  15. Lu L, Zhang H-J (2002) Content analysis for audio classification and segmentation. In: IEEE Transactions on Speech and Audio Processing 10: 504–516

  16. Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector Machines.: In: Multimedia Systems Journal 482–492

  17. McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Transactions on Acoustic, Speech and Signal Processing ASSP- 34 744–754

  18. Moon TK (1996) The Expectation-maximization algorithm. In: IEEE Signal Processing Magazine 13: 47–60

  19. Mowlaee Begzadeh Mahale P, Sayadiyan A, Faez K (2008) Mixed type audio classification using sinusoidal parameters. In Proc. 3rd ICTTA’08 1–5

  20. Nunes LO, Esquef PAA, Biscainho LWP, Merched R (2008) Partial tracking in sinusoidal modeling- an adaptive prediction-based RLS lattice solution. In: SIGMAP 2008 84–91

  21. Rabiner LR, Shafer RW (1975) Digital processing of speech signals. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  22. Ramamohan S, Dandapat S (2006) Sinusoidal model-based analysis and classification of stressed speech. In: IEEE Transactions on Audio, Speech and Language Processing 14: 737–746

  23. Regnier L, Peeters G (2009) Singing voice detection in music tracks using direct voice vibrato detection. In: Proceeding of ICASSP 2009 1685–1688

  24. Sadjadi OS, Ahadi SM, Hazrati O (2007) Unsupervised speech/music classification using one-class support vector machines. In: 6th IEEE ICICS 1–5

  25. Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceeding of ICASSP 1996 993–996

  26. Scheirer E, Slaney M (1997) Construction and evaluation of a robust multi-feature speech/music discriminator. In: Proceeding of ICASSP 1997 21–24

  27. Smith JO, Serra X (1987) PARSHL: An analysis/synthesis program for non-harmonic sound based on sinusoidal representation. http://www-ccrma.stanford.edu/~jos/parshl/parshl.pdf

  28. Somervuo P, Harma A, Fagerlund S (2006) Parametric representation of bird sounds for automatic species recognition. In: IEEE Transactions on Audio, Speech and Language Processing 14: 2252–2263

  29. Tancerel L, Ragot S, Ruoppilaand VT, Lefebyre R (2000) Combined speech and audio coding by discrimination. In: IEEE work-shop on speech coding, 17–20

  30. Thoshkahana B, Sudha V, Ramakrishnan KR (2006) A speech-music discriminator using HILN model based features. In: ICASSP 2006 425–428

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jalil Shirazi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shirazi, J., Ghaemmaghami, S. Improvement to speech-music discrimination using sinusoidal model based features. Multimed Tools Appl 50, 415–435 (2010). https://doi.org/10.1007/s11042-009-0416-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0416-3

Keywords