Singer identification based on computational auditory scene analysis and missing feature methods

Hu, Ying; Liu, Guizhong

doi:10.1007/s10844-013-0271-6

Singer identification based on computational auditory scene analysis and missing feature methods

Published: 09 August 2013

Volume 42, pages 333–352, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Ying Hu¹ &
Guizhong Liu¹

378 Accesses
7 Citations
Explore all metrics

Abstract

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and − 3 dB, in contrast to another system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bartsch, M.A. (2004). Automatic singer identification in polyphonic music. PhD dissertation, The University of Michigan
Bartsch, M.A., & Wakefield, G.H. (2004). Singing voice identification using spectral envelope estimation. IEEE Transactions on Speech and Audio Processing, 12, 100–109.
Article Google Scholar
Boersma, P., & Weenink, D. (2005). Praat. Doing phonetics by computer [computer program]. Retrieved 31 Mar 2005.
Cai, W., Li, Q., Guan, X. (2011). Automatic singer identification based on auditory features. In 7th int. conf. natural comput. (ICNC) (Vol. 3, pp. 1624–1628).
Cano, P., Loscos, A., Bonada, J., De Boer, M., Serra, X. (2000). Voice morphing system for impersonating in karaoke applications. In Proc. ICMC (pp. 109–112).
Chang, P. (2009). Pitch oriented automatic singer identification in pop music. In Int. conf. semantic comput. (ICSC) (pp. 161–166).
Cooke, M., Green, P., Josifovski, L., Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34, 267–285.
Article MATH Google Scholar
Fujihara, H., Goto, M., Kitahara, T., Okuno, H.G. (2010). A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 638–648.
Article Google Scholar
Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2005). Singer identification based on accompaniment sound reduction and reliable frame selection. In Proc. int. soc. music inf. retrieval conf. (ISMIR) (pp. 329–336).
Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP).
Hu, Y., & Liu, G. (2011). Dynamic characteristics of musical note for musical instrument classification. In IEEE int. conf. signal process., commun. and comput. (ICSPCC) (pp. 1–6).
Hu, Y., & Liu, G. (2013). Instrument identification and pitch estimation in multi-timbre polyphonic musical signals based on probabilistic mixture model decomposition. Journal of Intelligent Inf. Systems, 40(1), 1–18.
Article Google Scholar
Jin, Z., & Wang, D.L. (2009). A supervised learning approach to monaural segregation of reverberant speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 625–638.
Article Google Scholar
Khine, S.Z.K., Nwe, T.L., Li, H. (2008). Exploring perceptual based timbre feature for singer identification. In Computer music modeling and retrieval (CMMR. 2007). Lecture notes in computer science (Vol. 4969, pp. 159–171).
Kim, Y.E., & Whitman, B. (2002). Singer identification in popular music recordings using voice coding features. In Proc. int. soc. music inf. retrieval conf. (ISMIR).
Lagrange, M., Ozerov, A., Vincent, E. (2012). Robust singer identification in polyphonic music using melody enhancement and uncertainty-based learning. In Proc. int. soc. music inf. retrieval conf. (ISMIR).
Li, Y., & Wang, D.L. (2005). Detecting pitch of singing voice in polyphonic audio. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (Vol. 3, pp. iii/17–iii/20).
Li, Y., & Wang, D.L. (2007). Separation of singing voice from music accompaniment for monaural recordings. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1475–1487.
Article Google Scholar
Li, Y., & Wang, D.L. (2009). On the optimality of ideal binary time-frequency masks. Speech Communication, 51, 230–239.
Article Google Scholar
Maddage, N.C., Xu, C., Wang, Y. (2004). Singer identification based on vocal and instrumental models. In Proc. int. conf. pattern recognition (ICPR) (pp. 375–378).
Nwe, T.L., & Li, H. (2008). On fusion of timbre-motivated features for singing voice detection and singer identification. In Proc. IEEE int. conf. acoust., speech signal process. (ICASSP) (pp. 2225–2228).
Raj, B., Seltzer, M.L., Stern, R.M. (2004). Reconstruction of missing features for robust speech recognition. Speech communication, 43, 275–296.
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Shen, J., Cui, B., Shepherd, J., Tan, K.L. (2006). Towards efficient automated singer identification in large music databases. In Proc. int. ACM SIGIR conf. res. develop. inf. retrieval (Vol. 27, No. 3, pp. 59–66).
Shen, J., Shepherd, J., Cui, B., Tan, K.L. (2009). A novel framework for efficient automated singer identification in large music databases. ACM Transactions on Information Systems (TOIS), 27, 18.
Article Google Scholar
Sofianos, S., et al. (2012). H-semantics: a hybrid approach to singing voice separation. Journal of the Audio Engineering Society, 60(10), 831–841.
Google Scholar
Tsai, W.H., & Lin, H.P. (2010). Popular singer identification based on cepstrum transformation. In Proc. IEEE int. conf. multimedia expo (ICME) (pp. 584–589).
Tsai, W.H., & Lin, H.P. (2011). Background music removal based on cepstrum transformation for popular singer identification. IEEE Transactions on Audio, Speech, and Language Processing, 19(5), 1196–1205.
Article Google Scholar
Tsai, W.H., & Lee, H.C. (2012). Singer identification based on spoken data in voice charaterization. IEEE Transactions on Audio, Speech, and Language Processing, 20(8), 2291–2300.
Article Google Scholar
Wang, D.L. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In P. Divenyi (Ed.), Speech separation by humans and machines (pp. 181–197). Norwell: Kluwer Academic.
Chapter Google Scholar
Wang, D.L., & Brown, G.J. (2006). Computational auditory scene analysis: Principles, algorithms and applications. Hoboken: Wiley-IEEE Press.
Google Scholar
Zhao, X., Shao, Y., Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.
Article Google Scholar
Zwan, P., & Kostek, B. (2008). System for automatic singing voice recognition. Journal of the Audio Engineering Society, Vibrato and Intonation Parameters, 56(9), 710–723.
Google Scholar

Download references

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, China
Ying Hu & Guizhong Liu

Authors

Ying Hu
View author publications
You can also search for this author in PubMed Google Scholar
Guizhong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guizhong Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Y., Liu, G. Singer identification based on computational auditory scene analysis and missing feature methods. J Intell Inf Syst 42, 333–352 (2014). https://doi.org/10.1007/s10844-013-0271-6

Download citation

Received: 03 April 2013
Revised: 29 June 2013
Accepted: 22 July 2013
Published: 09 August 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10844-013-0271-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Singer identification based on computational auditory scene analysis and missing feature methods

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Detection and Classification Methods for Animal Sounds

Introduction to Acoustic Terminology and Signal Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Singer identification based on computational auditory scene analysis and missing feature methods

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Detection and Classification Methods for Animal Sounds

Introduction to Acoustic Terminology and Signal Processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation