Context-Aware Features for Singing Voice Detection in Polyphonic Music

Rao, Vishweshwara; Gupta, Chitralekha; Rao, Preeti

doi:10.1007/978-3-642-37425-8_4

Vishweshwara Rao¹⁹,
Chitralekha Gupta¹⁹ &
Preeti Rao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7836))

Included in the following conference series:

International Workshop on Adaptive Multimedia Retrieval

594 Accesses
2 Citations

Abstract

The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berenzweig, A., Ellis, D., Lawrence, S.: Using voice segments to improve artist classification of music. In: 22nd International Conference of Audio Engineering Society, Finland (2002)
Google Scholar
Li, Y., Wang, D.: Separation of singing voice from music accompaniment for monoaural recordings. IEEE Trans. of Audio, Speech Lang. Proc. 15(4), 1475–1487 (2007)
Article Google Scholar
Fujihara, H., Goto, M.: Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model and novel feature vectors for vocal activity detection. In: IEEE International Conference on Acoust., Speech, Signal Proc., Las Vegas (2008)
Google Scholar
Lukashevich, H., Gruhne, M., Dittmar, C.: Effective singing voice detection in popular music using ARMA filtering. In: 10th International Conference on Digital Audio Effects (DAFx 2007), Bordeaux, France (2007)
Google Scholar
Xiao, L., Zhou, J., Zhang, T.: Using DTW based unsupervised segmentation to improve the vocal part detection in pop music. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany (2008)
Google Scholar
Fujihara, et al.: F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In: IEEE International Conference on Acoust. Speech and Sig. Processing, Toulouse, France (2006)
Google Scholar
Berenzweig, A., Ellis, D.: Locating singing voice segments within music signals. In: IEEE Workshop Applications of Sig. Process. to Audio and Acoust., New York (2001)
Google Scholar
Maddage, N., Xu, C., Wang, Y.: A SVM-based classification approach to musical audio. In: International Conference on Music Information Retrieval, Baltimore (2003)
Google Scholar
Ramona, M., Richard, G., David, B.: Vocal detection in music with support vector machines. In: IEEE International Conference on Acoust. Speech and Sig. Process. (2008)
Google Scholar
Nwe, T., Li, H.: Exploring vibrato-motivated acoustic features for singer identification. IEEE Trans. Audio Speech Lang. Process. 15(2), 519–530 (2007)
Article Google Scholar
Kim, Y., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proc. 5th Intl. Conf. on Music Information Retrieval, Spain (2004)
Google Scholar
Nwe, T., Li, H.: On fusion of timbre-motivated features for singing voice detection and singer identification. In: IEEE International Conference Acoust., Speech, Signal Proc., Las Vegas (2008)
Google Scholar
Chou, W., Gu, L.: Robust singing detection in speech/music discriminator design. In: IEEE International Conference Acoust. Speech Sig. Process. (2001)
Google Scholar
Tzanetakis, G.: Song-specific bootstrapping of singing voice structure. In: IEEE International Conference Multimedia and Expo, Taipei, Taiwan (2004)
Google Scholar
Zhang, T.: System and method for automatic singer identification. In: IEEE International Conference Multimedia and Expo, Baltimore (2003)
Google Scholar
Vallet, F., McKinney, M.: Perceptual constraints for automatic vocal detection in music recordings. In: Conference Interdisciplinary Musicology (2007)
Google Scholar
Regnier, L., Peeters, G.: Singing voice detection in music tracks using direct voice vibrato detection. In: IEEE International Conference Acoust. Speech Sig. Process., Taipei, Taiwan (2009)
Google Scholar
Lidy, T., et al.: On the Suitability of State-of-the-art Music Information Retrieval Methods for Analyzing, Categorizing and Accessing Non-Western and Ethnic Music Collections. In: Elsevier Signal Processing Special issue on Ethnic Music Audio Documents: From the Preservation to the Fruition (2009)
Google Scholar
Mohammed, N., Squire, D.M.: Effectiveness of ICF features for collection-specific CBIR. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 83–95. Springer, Heidelberg (2013)
Google Scholar
Proutskova, P., Casey, M.: You call that singing? Ensemble classification for multi-cultural collections of music recordings. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)
Google Scholar
Fuhrmann, F., Haro, M., Herrera, P.: Scalability, Generality and Temporal Aspects in Automatic Recognition of Predominant Musical Instruments in Polyphonic Music. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)
Google Scholar
Fuhijara, H., Goto, M., Kitahara, T., Okuno, H.: A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Trans. Audio, Speech, Lang. Process. 18(3), 638–648 (2010)
Article Google Scholar
Rao, V., Rao, P.: Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech and Lang. Process. 18(8), 2145–2154 (2010)
Article Google Scholar
Pant, S., Rao, V., Rao, P.: A melody detection user interface for polyphonic music. In: National Conference Comm., Chennai, India (2010)
Google Scholar
Rao, V., Gaddipati, P., Rao, P.: Signal-driven adaptation for singing voice processing in polyphony. IEEE Trans. Audio, Speech, Lang. Process. (2011) (accepted with minor mandatory revisions)
Google Scholar
Serra, X.: Music sound modeling with sinusoids plus noise. In: Roads, C., Pope, S., Picialli, A., De Poli, G. (eds.) Musical Signal Processing, Swets and Zeitlinger (1997)
Google Scholar
Rocamora, M., Herrera, P.: Comparing audio descriptors for singing voice detection in music audio files. In: Brazilian Symposium on Computer Music (2007)
Google Scholar
Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. In: CUIDADO I.S.T. Project Report (2004)
Google Scholar
Lagrange, M., Raspaud, M., Badeau, R., Richard, G.: Explicit modeling of temporal dynamics within musical signals for acoustic unit similarity. Pattern Recog. Letters 31(12), 1498–1506 (2010)
Article Google Scholar
Burred, J., Robel, A., Sikora, T.: Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. IEEE Trans. Audio Speech Lang. Process. 18(3), 663–674 (2010)
Article Google Scholar
Aucouturier, J.-J., Patchet, F.: The influence of polyphony on the dynamic modeling of musical timbre. Pattern Recog. Letters 28(5), 654–661 (2007)
Article Google Scholar
Sundberg, J.: A rhapsody on perception. In: The Science of Singing Voice. Northern Illinois University Press (1987)
Google Scholar
Shenoy, A., Wu, Y., Wang, Y.: Singing voice detection for karaoke application. In: Visual Comm. and Image Proc., Beijing, China (2005)
Google Scholar
Rao, V., Rao, P.: Singing voice detection using predominant pitch. In: InterSpeech, Brighton, U.K. (2009)
Google Scholar
Hall, M., et al.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Bouman, C.: Cluster: An unsupervised algorithm for modeling Gaussian mixtures, http://www.ece.purdue.edu/~bouman
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 20(3) (1998)
Google Scholar
Markaki, M., Holzapfel, A., Stylianou, Y.: Singing voice detection using modulation frequency features. In: Workshop on Statistical and Perceptual Audition (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, IIT Bombay, Mumbai, 76, India
Vishweshwara Rao, Chitralekha Gupta & Preeti Rao

Authors

Vishweshwara Rao
View author publications
You can also search for this author in PubMed Google Scholar
Chitralekha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Preeti Rao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Laboratoire d’Informatique de Paris 6 (LIP6), Université Pierre et Marie Curie, 104, Avenue du Président Kennedy, 75016, Paris, France
Marcin Detyniecki
Departamento de Lenguajes y Sistemas Informáticos (LSI), Universidad Nacional de Educación a Distancia (UNED), 28040, Madrid, Spain
Ana García-Serrano
Faculty of Computer Science, Otto-von-Guericke University, Universitätsplatz 2, 39106, Magdeburg, Germany
Andreas Nürnberger & Sebastian Stober &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rao, V., Gupta, C., Rao, P. (2013). Context-Aware Features for Singing Voice Detection in Polyphonic Music. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds) Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation. AMR 2011. Lecture Notes in Computer Science, vol 7836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37425-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-37425-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37424-1
Online ISBN: 978-3-642-37425-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics