Skip to main content
Log in

Content-based singer classification on compressed domain audio data

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it cannot be directly obtained from compressed music data such as MP3 format. We introduce a modified method for calculating MFCC vector in MP3 compressed domain. For describing the distribution of MFCC vector, the Gaussian mixture model (GMM) is applied. To find the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. The experimental result verifies the feasibility of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Abeßer J, Lukashevich H, Dittmar C, Schuller G (2009) Genre classification using bass-related high-level features and playing styles. Proceeding of the 10th International Society for Music Information Retrieval, pp. 453–458

  2. Bouman CA (2005) Cluster: An unsupervised algorithm for modeling Gaussian mixtures. Tech.rep.,School of Electrical Engineering, Purdue University, http://engineering.purdue.edu/bouman/software/cluster

  3. C. C. Liu and P. J. Tsai, “Content-based retrieval of MP3 music objects, ” proceeding of the ACM international conference on information and knowledge management 2001, 506–511

  4. Chang LY, Yu XQ, Wan WG, Li CL, Xu XQ (2009) Research and realization of speech segmentation in MP3 compressed domain. J Comput Appl 29(4):1188–1192

    Google Scholar 

  5. ChaoZhen,Jieping Xu(2010). Multi-modal Music Genre Classification Approach. Proceeding of the 3rd IEEE International Conference on Computer Science and Information Technology(ICCSIT)

  6. D. Pye, “Content-based methods for the management of digital music,” proceeding of the IEEE international conference on acoustics, speech and signal processing (ICASSP 2000), 24–27

  7. Gu HY, You ZR (2008) A speaker-clustering method using GMM and k-means. Proceeding of the 13th Taiwanese Association for Artificial Intelligence

  8. H.A. Patil, P. G. Radadia and T. K. Basu . Combining evidences from Mel Cepstral features and cepstral mean subtracted features for singer identification, asian language processing (IALP), 2012 International Conference on

  9. Hasan MR, Jamil M, Rahman MGRMS (2004) Speaker identification using Mel frequency cepstral coefficients. Proceeding of the 3rd International Conference on Electrical and Computer Engineering, pp. 566–568

  10. Jang JS. Audio Signal Processing and Recognition Chapter 12: Speech Features, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/speechFeatureMfcc.asp

  11. Langlois, T (2009). Automatic music genre classification using a hierarchical clustering and a language model approach. Proceeding of the first international conference on advances in multimedia

  12. Langlois T, Marques G (2009) A music classification method based on timbral features. Proceeding of the 10th International society for music information retrieval conference, pp. 81–86

  13. Lidy T, Rauber A (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceeding of the 6th international society for music information retrieval, pp. 34–41

  14. Liu CC, Huang CS (2002) A singer identification technique for content-based classification of MP3 music objects. Proceeding of the 11th international conference on information and knowledge management, pp. 438–445

  15. Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceeding of the 1st International Symposium on Music Information Retrieval.

  16. Maddage NC, Xu C, Wang Y (2004) Singer identification based on vocal and instrumental Models. Proceeding of the 17th International Conference on Pattern Recognition, pp. 375–378

  17. Mesaros A, Astola J (2005) The Mel-frequency cepstral coefficients in the context of singer identification. Proceeding of the 6th International Conference on Music Information Retrieval

  18. Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. Proceeding of the 8th International Conference on Music Information Retrieval, pp. 375–378

  19. Namunu Chinthaka M, Changsheng X, Ye W (2004) Singer Identification Based on Vocal and Instrumental models”. Proc 17th Int Conf Pattern Recog (ICPR’04) 2:375–378

    Google Scholar 

  20. Panagakis I, Benetos E, Kotropoulos C (2008) Music genre classification: a multilinear approach. Proceeding of the 9th international society for music information retrieval, pp. 583–588

  21. Panagakis Y, Kotropoulos C, Arce GR (2009) Music genre classification using locality preserving non-negative tensor factorization and sparse representations. Proceeding of the 10th international society for music information retrieval, pp. 249–254

  22. Peng X, Xu W, Wang B (2005) Speaker clustering via novel pseudo-divergence of Gaussian mixture models. Proceeding of the 1st Natural Language Processing and Knowledge Engineering conference, pp. 111–114

  23. R. Sridhar1 and T. V. Geetha. Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification. Computer and Electrical Engineering, 2008. ICCEE 2008. International Conference on

  24. Shen J, Cui B, Shepherd J, Tan KL (2006) Towards efficient automated singer identification in large music databases. Proceeding of the 29th Special Interest Group on Information Retrieval, pp. 59–66

  25. Sony Ericsson TrackID, http://www.sonyericsson.com/product/trackid/

  26. Sigurdsson S, Petersen KB, Lehn-Schiøler T (2006) Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music. Proceeding of the 7th International Conference on Music Information Retrieval

  27. Suraj Jadhav1, Shashank Kava, (2013) Voice Activated Calculator, International Journal of Emerging Technology and Advanced Engineering

  28. Swe Zin Kalayar Khine, Tin Lay Nwe, and Haizhou Li, “Exploring Perceptual Based Timbre Feature for Singer Identification”, CMMR 2007, LNCS 4969, 2008, pp. 159–171

  29. Tin Lay Nwe and Ye Wang, “Automatic detection of vocal segments in popular songs”, ISMIR 2004. pp. 138–145

  30. Tong Zhang, (2003) “Automatic singer identification”, ICME, vol.1, pp. I −33–6.

  31. Tsai WH, Liao SJ, Lai C (2008) Automatic identification of simultaneous singers in duet recordings. Proceeding of the 9th international conference on music information retrieval, pp. 115–120

  32. Tsai WH, Wang HM (2006) Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Trans 14(1):330–341

    Google Scholar 

  33. Wang Y, Yaroslavsky L, Vilermo M (2000) On the relationship between MDCT, SDFT and DFT. Proc 5th Int Conf on Signal Process 1:44–47

    Article  Google Scholar 

  34. Wei Cai (2011). Automatic singer identification based on auditory features. Proceeding of the seventh international conference on natural computation

  35. W. N. Lie and C. K. Su, “Content-based retrieval of MP3 songs based on query by singing, ” proceeding of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 2004), 929–932

  36. Y. H. Jiao, B. Yang, M. Y. Li and X. M. Niu, “MDCT-based perceptual hashing for compressed audio content identification, ” proceeding of the IEEE workshop on multimedia signal processing (MMSP 2007), 381–384

  37. Young moo and Brian Whitman, “Singer Identification in Popular Music Recordings Using Voice Coding Features”,ISMIR 2002

  38. Yuhua Jiao, “MDCT-Based Perceptual Hashing for Compressed Audio Content identification”, Multimedia Signal Processing, 2007,pp.381-384

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pei-Yun Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsai, TH., Huang, YS., Liu, PY. et al. Content-based singer classification on compressed domain audio data. Multimed Tools Appl 74, 1489–1509 (2015). https://doi.org/10.1007/s11042-014-2189-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2189-6

Keywords

Navigation