Abstract
The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berenzweig, A., Ellis, D., Lawrence, S.: Using voice segments to improve artist classification of music. In: 22nd International Conference of Audio Engineering Society, Finland (2002)
Li, Y., Wang, D.: Separation of singing voice from music accompaniment for monoaural recordings. IEEE Trans. of Audio, Speech Lang. Proc. 15(4), 1475–1487 (2007)
Fujihara, H., Goto, M.: Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model and novel feature vectors for vocal activity detection. In: IEEE International Conference on Acoust., Speech, Signal Proc., Las Vegas (2008)
Lukashevich, H., Gruhne, M., Dittmar, C.: Effective singing voice detection in popular music using ARMA filtering. In: 10th International Conference on Digital Audio Effects (DAFx 2007), Bordeaux, France (2007)
Xiao, L., Zhou, J., Zhang, T.: Using DTW based unsupervised segmentation to improve the vocal part detection in pop music. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany (2008)
Fujihara, et al.: F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In: IEEE International Conference on Acoust. Speech and Sig. Processing, Toulouse, France (2006)
Berenzweig, A., Ellis, D.: Locating singing voice segments within music signals. In: IEEE Workshop Applications of Sig. Process. to Audio and Acoust., New York (2001)
Maddage, N., Xu, C., Wang, Y.: A SVM-based classification approach to musical audio. In: International Conference on Music Information Retrieval, Baltimore (2003)
Ramona, M., Richard, G., David, B.: Vocal detection in music with support vector machines. In: IEEE International Conference on Acoust. Speech and Sig. Process. (2008)
Nwe, T., Li, H.: Exploring vibrato-motivated acoustic features for singer identification. IEEE Trans. Audio Speech Lang. Process. 15(2), 519–530 (2007)
Kim, Y., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proc. 5th Intl. Conf. on Music Information Retrieval, Spain (2004)
Nwe, T., Li, H.: On fusion of timbre-motivated features for singing voice detection and singer identification. In: IEEE International Conference Acoust., Speech, Signal Proc., Las Vegas (2008)
Chou, W., Gu, L.: Robust singing detection in speech/music discriminator design. In: IEEE International Conference Acoust. Speech Sig. Process. (2001)
Tzanetakis, G.: Song-specific bootstrapping of singing voice structure. In: IEEE International Conference Multimedia and Expo, Taipei, Taiwan (2004)
Zhang, T.: System and method for automatic singer identification. In: IEEE International Conference Multimedia and Expo, Baltimore (2003)
Vallet, F., McKinney, M.: Perceptual constraints for automatic vocal detection in music recordings. In: Conference Interdisciplinary Musicology (2007)
Regnier, L., Peeters, G.: Singing voice detection in music tracks using direct voice vibrato detection. In: IEEE International Conference Acoust. Speech Sig. Process., Taipei, Taiwan (2009)
Lidy, T., et al.: On the Suitability of State-of-the-art Music Information Retrieval Methods for Analyzing, Categorizing and Accessing Non-Western and Ethnic Music Collections. In: Elsevier Signal Processing Special issue on Ethnic Music Audio Documents: From the Preservation to the Fruition (2009)
Mohammed, N., Squire, D.M.: Effectiveness of ICF features for collection-specific CBIR. In: Detyniecki, M., GarcÃa-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 83–95. Springer, Heidelberg (2013)
Proutskova, P., Casey, M.: You call that singing? Ensemble classification for multi-cultural collections of music recordings. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)
Fuhrmann, F., Haro, M., Herrera, P.: Scalability, Generality and Temporal Aspects in Automatic Recognition of Predominant Musical Instruments in Polyphonic Music. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)
Fuhijara, H., Goto, M., Kitahara, T., Okuno, H.: A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Trans. Audio, Speech, Lang. Process. 18(3), 638–648 (2010)
Rao, V., Rao, P.: Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech and Lang. Process. 18(8), 2145–2154 (2010)
Pant, S., Rao, V., Rao, P.: A melody detection user interface for polyphonic music. In: National Conference Comm., Chennai, India (2010)
Rao, V., Gaddipati, P., Rao, P.: Signal-driven adaptation for singing voice processing in polyphony. IEEE Trans. Audio, Speech, Lang. Process. (2011) (accepted with minor mandatory revisions)
Serra, X.: Music sound modeling with sinusoids plus noise. In: Roads, C., Pope, S., Picialli, A., De Poli, G. (eds.) Musical Signal Processing, Swets and Zeitlinger (1997)
Rocamora, M., Herrera, P.: Comparing audio descriptors for singing voice detection in music audio files. In: Brazilian Symposium on Computer Music (2007)
Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. In: CUIDADO I.S.T. Project Report (2004)
Lagrange, M., Raspaud, M., Badeau, R., Richard, G.: Explicit modeling of temporal dynamics within musical signals for acoustic unit similarity. Pattern Recog. Letters 31(12), 1498–1506 (2010)
Burred, J., Robel, A., Sikora, T.: Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. IEEE Trans. Audio Speech Lang. Process. 18(3), 663–674 (2010)
Aucouturier, J.-J., Patchet, F.: The influence of polyphony on the dynamic modeling of musical timbre. Pattern Recog. Letters 28(5), 654–661 (2007)
Sundberg, J.: A rhapsody on perception. In: The Science of Singing Voice. Northern Illinois University Press (1987)
Shenoy, A., Wu, Y., Wang, Y.: Singing voice detection for karaoke application. In: Visual Comm. and Image Proc., Beijing, China (2005)
Rao, V., Rao, P.: Singing voice detection using predominant pitch. In: InterSpeech, Brighton, U.K. (2009)
Hall, M., et al.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Bouman, C.: Cluster: An unsupervised algorithm for modeling Gaussian mixtures, http://www.ece.purdue.edu/~bouman
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 20(3) (1998)
Markaki, M., Holzapfel, A., Stylianou, Y.: Singing voice detection using modulation frequency features. In: Workshop on Statistical and Perceptual Audition (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rao, V., Gupta, C., Rao, P. (2013). Context-Aware Features for Singing Voice Detection in Polyphonic Music. In: Detyniecki, M., GarcÃa-Serrano, A., Nürnberger, A., Stober, S. (eds) Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation. AMR 2011. Lecture Notes in Computer Science, vol 7836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37425-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37425-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37424-1
Online ISBN: 978-3-642-37425-8
eBook Packages: Computer ScienceComputer Science (R0)