Skip to main content
Log in

Singer identification based on computational auditory scene analysis and missing feature methods

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and − 3 dB, in contrast to another system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bartsch, M.A. (2004). Automatic singer identification in polyphonic music. PhD dissertation, The University of Michigan

  • Bartsch, M.A., & Wakefield, G.H. (2004). Singing voice identification using spectral envelope estimation. IEEE Transactions on Speech and Audio Processing, 12, 100–109.

    Article  Google Scholar 

  • Boersma, P., & Weenink, D. (2005). Praat. Doing phonetics by computer [computer program]. Retrieved 31 Mar 2005.

  • Cai, W., Li, Q., Guan, X. (2011). Automatic singer identification based on auditory features. In 7th int. conf. natural comput. (ICNC) (Vol. 3, pp. 1624–1628).

  • Cano, P., Loscos, A., Bonada, J., De Boer, M., Serra, X. (2000). Voice morphing system for impersonating in karaoke applications. In Proc. ICMC (pp. 109–112).

  • Chang, P. (2009). Pitch oriented automatic singer identification in pop music. In Int. conf. semantic comput. (ICSC) (pp. 161–166).

  • Cooke, M., Green, P., Josifovski, L., Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34, 267–285.

    Article  MATH  Google Scholar 

  • Fujihara, H., Goto, M., Kitahara, T., Okuno, H.G. (2010). A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 638–648.

    Article  Google Scholar 

  • Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2005). Singer identification based on accompaniment sound reduction and reliable frame selection. In Proc. int. soc. music inf. retrieval conf. (ISMIR) (pp. 329–336).

  • Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP).

  • Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. signal process., commun. and comput. (ICSPCC) (pp. 1–6).

  • Hu, Y., & Liu, G. (2013). Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition. Journal of Intelligent Inf. Systems, 40(1), 1–18.

    Article  Google Scholar 

  • Jin, Z., & Wang, D.L. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 625–638.

    Article  Google Scholar 

  • Khine, S.Z.K., Nwe, T.L., Li, H. (2008). Exploring perceptual based timbre feature for singer identification. In Computer music modeling and retrieval (CMMR. 2007). Lecture notes in computer science (Vol. 4969, pp. 159–171).

  • Kim, Y.E., & Whitman, B. (2002). Singer identification in popular music recordings using voice coding features. In Proc. int. soc. music inf. retrieval conf. (ISMIR).

  • Lagrange, M., Ozerov, A., Vincent, E. (2012). Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In Proc. int. soc. music inf. retrieval conf. (ISMIR).

  • Li, Y., & Wang, D.L. (2005). Detecting pitch of singing voice in polyphonic audio. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (Vol. 3, pp. iii/17–iii/20).

  • Li, Y., & Wang, D.L. (2007). Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1475–1487.

    Article  Google Scholar 

  • Li, Y., & Wang, D.L. (2009). On the optimality of ideal binary time-frequency masks. Speech Communication, 51, 230–239.

    Article  Google Scholar 

  • Maddage, N.C., Xu, C., Wang, Y. (2004). Singer identification based on vocal and instrumental models. In Proc. int. conf. pattern recognition (ICPR) (pp. 375–378).

  • Nwe, T.L., & Li, H. (2008). On fusion of timbre-motivated features for singing voice detection and singer identification. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (pp. 2225–2228).

  • Raj, B., Seltzer, M.L., Stern, R.M. (2004). Reconstruction of missing features for robust speech recognition. Speech communication, 43, 275–296.

    Article  Google Scholar 

  • Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Shen, J., Cui, B., Shepherd, J., Tan, K.L. (2006). Towards efficient automated singer identification in large music databases. In Proc. int. ACM SIGIR conf. res. develop. inf. retrieval (Vol. 27, No. 3, pp. 59–66).

  • Shen, J., Shepherd, J., Cui, B., Tan, K.L. (2009). A novel framework for efficient automated singer identification in large music databases. ACM Transactions on Information Systems (TOIS), 27, 18.

    Article  Google Scholar 

  • Sofianos, S., et al. (2012). H-semantics: a hybrid approach to singing voice separation. Journal of the Audio Engineering Society, 60(10), 831–841.

    Google Scholar 

  • Tsai, W.H., & Lin, H.P. (2010). Popular singer identification based on cepstrum transformation. In Proc. IEEE int. conf. multimedia expo (ICME) (pp. 584–589).

  • Tsai, W.H., & Lin, H.P. (2011). Background music removal based on cepstrum transformation for popular singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 19(5), 1196–1205.

    Article  Google Scholar 

  • Tsai, W.H., & Lee, H.C. (2012). Singer identification based on spoken data in voice charaterization. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2291–2300.

    Article  Google Scholar 

  • Wang, D.L. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In P. Divenyi (Ed.), Speech separation by humans and machines (pp. 181–197). Norwell: Kluwer Academic.

    Chapter  Google Scholar 

  • Wang, D.L., & Brown, G.J. (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken: Wiley-IEEE Press.

    Google Scholar 

  • Zhao, X., Shao, Y., Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.

    Article  Google Scholar 

  • Zwan, P., & Kostek, B. (2008). System for automatic singing voice recognition. Journal of the Audio Engineering Society, Vibrato and Intonation Parameters, 56(9), 710–723.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guizhong Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Y., Liu, G. Singer identification based on computational auditory scene analysis and missing feature methods. J Intell Inf Syst 42, 333–352 (2014). https://doi.org/10.1007/s10844-013-0271-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0271-6

Keywords

Navigation