Abstract
In this paper, we proposed a singer identification approach to automatically identify the singer of an unknown MP3 audio data. Differing from previous researches for singer identification in MP3 compressed domain, we use Mel-Frequency Cepstral Coefficients (MFCC) as the feature instead of MDCT (modified discrete cosine transform) coefficients. Although MFCC is often used in music classification and speaker recognition, it cannot be directly obtained from compressed music data such as MP3 format. We introduce a modified method for calculating MFCC vector in MP3 compressed domain. For describing the distribution of MFCC vector, the Gaussian mixture model (GMM) is applied. To find the nearest singer, we use maximum likelihood classification (MLC) to allot each input MFCC vector to its nearest group. The experimental result verifies the feasibility of the proposed approach.
Similar content being viewed by others
References
Abeßer J, Lukashevich H, Dittmar C, Schuller G (2009) Genre classification using bass-related high-level features and playing styles. Proceeding of the 10th International Society for Music Information Retrieval, pp. 453–458
Bouman CA (2005) Cluster: An unsupervised algorithm for modeling Gaussian mixtures. Tech.rep.,School of Electrical Engineering, Purdue University, http://engineering.purdue.edu/bouman/software/cluster
C. C. Liu and P. J. Tsai, “Content-based retrieval of MP3 music objects, ” proceeding of the ACM international conference on information and knowledge management 2001, 506–511
Chang LY, Yu XQ, Wan WG, Li CL, Xu XQ (2009) Research and realization of speech segmentation in MP3 compressed domain. J Comput Appl 29(4):1188–1192
ChaoZhen,Jieping Xu(2010). Multi-modal Music Genre Classification Approach. Proceeding of the 3rd IEEE International Conference on Computer Science and Information Technology(ICCSIT)
D. Pye, “Content-based methods for the management of digital music,” proceeding of the IEEE international conference on acoustics, speech and signal processing (ICASSP 2000), 24–27
Gu HY, You ZR (2008) A speaker-clustering method using GMM and k-means. Proceeding of the 13th Taiwanese Association for Artificial Intelligence
H.A. Patil, P. G. Radadia and T. K. Basu . Combining evidences from Mel Cepstral features and cepstral mean subtracted features for singer identification, asian language processing (IALP), 2012 International Conference on
Hasan MR, Jamil M, Rahman MGRMS (2004) Speaker identification using Mel frequency cepstral coefficients. Proceeding of the 3rd International Conference on Electrical and Computer Engineering, pp. 566–568
Jang JS. Audio Signal Processing and Recognition Chapter 12: Speech Features, http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/speechFeatureMfcc.asp
Langlois, T (2009). Automatic music genre classification using a hierarchical clustering and a language model approach. Proceeding of the first international conference on advances in multimedia
Langlois T, Marques G (2009) A music classification method based on timbral features. Proceeding of the 10th International society for music information retrieval conference, pp. 81–86
Lidy T, Rauber A (2005) Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceeding of the 6th international society for music information retrieval, pp. 34–41
Liu CC, Huang CS (2002) A singer identification technique for content-based classification of MP3 music objects. Proceeding of the 11th international conference on information and knowledge management, pp. 438–445
Logan B (2000) Mel frequency cepstral coefficients for music modeling. Proceeding of the 1st International Symposium on Music Information Retrieval.
Maddage NC, Xu C, Wang Y (2004) Singer identification based on vocal and instrumental Models. Proceeding of the 17th International Conference on Pattern Recognition, pp. 375–378
Mesaros A, Astola J (2005) The Mel-frequency cepstral coefficients in the context of singer identification. Proceeding of the 6th International Conference on Music Information Retrieval
Mesaros A, Virtanen T, Klapuri A (2007) Singer identification in polyphonic music using vocal separation and pattern recognition methods. Proceeding of the 8th International Conference on Music Information Retrieval, pp. 375–378
Namunu Chinthaka M, Changsheng X, Ye W (2004) Singer Identification Based on Vocal and Instrumental models”. Proc 17th Int Conf Pattern Recog (ICPR’04) 2:375–378
Panagakis I, Benetos E, Kotropoulos C (2008) Music genre classification: a multilinear approach. Proceeding of the 9th international society for music information retrieval, pp. 583–588
Panagakis Y, Kotropoulos C, Arce GR (2009) Music genre classification using locality preserving non-negative tensor factorization and sparse representations. Proceeding of the 10th international society for music information retrieval, pp. 249–254
Peng X, Xu W, Wang B (2005) Speaker clustering via novel pseudo-divergence of Gaussian mixture models. Proceeding of the 1st Natural Language Processing and Knowledge Engineering conference, pp. 111–114
R. Sridhar1 and T. V. Geetha. Music Information Retrieval of Carnatic Songs Based on Carnatic Music Singer Identification. Computer and Electrical Engineering, 2008. ICCEE 2008. International Conference on
Shen J, Cui B, Shepherd J, Tan KL (2006) Towards efficient automated singer identification in large music databases. Proceeding of the 29th Special Interest Group on Information Retrieval, pp. 59–66
Sony Ericsson TrackID, http://www.sonyericsson.com/product/trackid/
Sigurdsson S, Petersen KB, Lehn-Schiøler T (2006) Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music. Proceeding of the 7th International Conference on Music Information Retrieval
Suraj Jadhav1, Shashank Kava, (2013) Voice Activated Calculator, International Journal of Emerging Technology and Advanced Engineering
Swe Zin Kalayar Khine, Tin Lay Nwe, and Haizhou Li, “Exploring Perceptual Based Timbre Feature for Singer Identification”, CMMR 2007, LNCS 4969, 2008, pp. 159–171
Tin Lay Nwe and Ye Wang, “Automatic detection of vocal segments in popular songs”, ISMIR 2004. pp. 138–145
Tong Zhang, (2003) “Automatic singer identification”, ICME, vol.1, pp. I −33–6.
Tsai WH, Liao SJ, Lai C (2008) Automatic identification of simultaneous singers in duet recordings. Proceeding of the 9th international conference on music information retrieval, pp. 115–120
Tsai WH, Wang HM (2006) Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Trans 14(1):330–341
Wang Y, Yaroslavsky L, Vilermo M (2000) On the relationship between MDCT, SDFT and DFT. Proc 5th Int Conf on Signal Process 1:44–47
Wei Cai (2011). Automatic singer identification based on auditory features. Proceeding of the seventh international conference on natural computation
W. N. Lie and C. K. Su, “Content-based retrieval of MP3 songs based on query by singing, ” proceeding of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 2004), 929–932
Y. H. Jiao, B. Yang, M. Y. Li and X. M. Niu, “MDCT-based perceptual hashing for compressed audio content identification, ” proceeding of the IEEE workshop on multimedia signal processing (MMSP 2007), 381–384
Young moo and Brian Whitman, “Singer Identification in Popular Music Recordings Using Voice Coding Features”,ISMIR 2002
Yuhua Jiao, “MDCT-Based Perceptual Hashing for Compressed Audio Content identification”, Multimedia Signal Processing, 2007,pp.381-384
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tsai, TH., Huang, YS., Liu, PY. et al. Content-based singer classification on compressed domain audio data. Multimed Tools Appl 74, 1489–1509 (2015). https://doi.org/10.1007/s11042-014-2189-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2189-6