Article

Singing voice detection using perceptually-motivated features

Authors:
Tin Lay Nwe

Institute for Infocomm Research, Singapore, Singapore

Institute for Infocomm Research, Singapore, Singapore
View Profile

,
Haizhou Li

Institute for Infocomm Research, Singapore, Singapore

Institute for Infocomm Research, Singapore, Singapore
View Profile

MM '07: Proceedings of the 15th ACM international conference on MultimediaSeptember 2007Pages 309–312https://doi.org/10.1145/1291233.1291299

Published:29 September 2007Publication History

MM '07: Proceedings of the 15th ACM international conference on Multimedia

Pages 309–312

ABSTRACT

Perceptual features are motivated by human perception of sounds. In this paper, several perceptually-motivated features such as harmonic, vibrato and timbre are studied to detect singing voice segments in a song. In addition, singing formant and attack-decay envelope of the sound are also studied for acoustic feature formulation. The cepstral coefficients which reflect the timbre characteristics are formulated by combining information from harmonic content, vibrato, singing formant and attack-decay envelope of the sound. Bandpass filters that spread according to the octave frequency scale are used to extract vibrato and harmonic information. Several experiments are conducted using a database that includes 84 popular songs from commercially available CD recordings. The experiments show that the proposed feature formulation methods are effective.

References

Becchetti, C., and Ricotti, L. P. Speech Recognition Theory and C++ Implementation. New York: John Wiley & Sons, 1998 Google ScholarDigital Library
Everest, F. A. The Master Handbook of Acoustics. New York, McGraw-Hill, 2001.Google Scholar
Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T. and Okuno, H. G. F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2006, vol. 5, pp. V-253--V-256.Google ScholarCross Ref
Goto, M. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, vol. 43, no. 4, pp. 311--329, September 2004.Google ScholarCross Ref
Hackhaus, W. Die Ausgleichsvorgange. Zeitschrift fur Technische Physik, 1932.Google Scholar
Mellody, M., Herseth, F. and Wakefield, G. H. Modal distribution analysis, synthesis, and perception of a soprano's sung vowels. J. Voice, vol. 15, pp. 469--482, December 2001.Google ScholarCross Ref
Nwe, T. L., Foo, S. W., and De Silva, L. C. Stress classification using subband based features. IEICE Trans. Information and Systems, Special Issue on Speech Information Processing, vol. E86-D, no.3, pp. 565--573, March 2003.Google Scholar
Nwe, T. L. and Li, H. Exploring vibrato-motivated acoustic features for singer identification. IEEE Transactions, Audio, Speech and Language Processing: vol. 15, no. 2, 2007. Google ScholarDigital Library
Rabiner, L. R., and Juang, B. H. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993 Google ScholarDigital Library
Sundberg, J. The Acoustics of The Singing Voice, Scientific American, 1977.Google ScholarCross Ref
Sundberg, J. The Science of Singing Voice. Northern Illinois University Press, 1987, ch. 8.Google Scholar
Tzanetakis, G. Song-specific bootstrapping of singing voice structure. IEEE Int. Conf. Multimedia and Expo, 2004.Google ScholarCross Ref
Timmers, R., and Desain, P. Vibrato: Questions and answers from musicians and science. in Proc. Int. Conf. Music Perception and Cognition, England, 2000.Google Scholar
"Vibrato", Word of the Day. Answers Corporation, 2006. Answers.com 13 Dec. 2006. http://www.answers.com/topic/vibratoGoogle Scholar
Wakefield, G. H. and Bartsch, M. A. Where's Caruso? Singer identification by listener and machine. Cambridge Music Processing Colloquium, Cambridge, England, 2003.Google Scholar
Winckell, F. Music, sound and sensation. Dover, NY, 1967.Google Scholar
Zhang, T. System and method for automatic singer identification. IEEE Int. Conf. Multimedia and Expo, Baltimore, MD, 2003. Google ScholarDigital Library

Index Terms

Singing voice detection using perceptually-motivated features
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval

Recommendations

Robust singer identification of Indian playback singers

Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormous increase in production of movies and ...
Read More
Exploring Perceptual Based Timbre Feature for Singer Identification
Computer Music Modeling and Retrieval. Sense of Sounds

Timbre can be defined as feature of an auditory stimulus that allows us to distinguish the sounds which have the same pitch and loudness. In this paper, we explore timbre based perceptual feature for singer identification. We start with a vocal ...
Read More
A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval

This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
General Chairs:
Rainer Lienhart
University of Augsburg, Germany
,
Anand R. Prasad
DoCoMo Euro-Labs,Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, The Netherlands
,
Sunghyun Choi
Seoul National University, South Korea
,
Brian Bailey
University of Illinois at Urbana-Champaign
,
Nicu Sebe
University of Amsterdam, The Netherlands
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
harmonic
singing formant
singing voice
timbre
vibrato
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 406
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Singing voice detection using perceptually-motivated features

MM '07: Proceedings of the 15th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Robust singer identification of Indian playback singers

Exploring Perceptual Based Timbre Feature for Singer Identification

A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval