ABSTRACT
Perceptual features are motivated by human perception of sounds. In this paper, several perceptually-motivated features such as harmonic, vibrato and timbre are studied to detect singing voice segments in a song. In addition, singing formant and attack-decay envelope of the sound are also studied for acoustic feature formulation. The cepstral coefficients which reflect the timbre characteristics are formulated by combining information from harmonic content, vibrato, singing formant and attack-decay envelope of the sound. Bandpass filters that spread according to the octave frequency scale are used to extract vibrato and harmonic information. Several experiments are conducted using a database that includes 84 popular songs from commercially available CD recordings. The experiments show that the proposed feature formulation methods are effective.
- Becchetti, C., and Ricotti, L. P. Speech Recognition Theory and C++ Implementation. New York: John Wiley & Sons, 1998 Google ScholarDigital Library
- Everest, F. A. The Master Handbook of Acoustics. New York, McGraw-Hill, 2001.Google Scholar
- Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T. and Okuno, H. G. F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 2006, vol. 5, pp. V-253--V-256.Google ScholarCross Ref
- Goto, M. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, vol. 43, no. 4, pp. 311--329, September 2004.Google ScholarCross Ref
- Hackhaus, W. Die Ausgleichsvorgange. Zeitschrift fur Technische Physik, 1932.Google Scholar
- Mellody, M., Herseth, F. and Wakefield, G. H. Modal distribution analysis, synthesis, and perception of a soprano's sung vowels. J. Voice, vol. 15, pp. 469--482, December 2001.Google ScholarCross Ref
- Nwe, T. L., Foo, S. W., and De Silva, L. C. Stress classification using subband based features. IEICE Trans. Information and Systems, Special Issue on Speech Information Processing, vol. E86-D, no.3, pp. 565--573, March 2003.Google Scholar
- Nwe, T. L. and Li, H. Exploring vibrato-motivated acoustic features for singer identification. IEEE Transactions, Audio, Speech and Language Processing: vol. 15, no. 2, 2007. Google ScholarDigital Library
- Rabiner, L. R., and Juang, B. H. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ, 1993 Google ScholarDigital Library
- Sundberg, J. The Acoustics of The Singing Voice, Scientific American, 1977.Google ScholarCross Ref
- Sundberg, J. The Science of Singing Voice. Northern Illinois University Press, 1987, ch. 8.Google Scholar
- Tzanetakis, G. Song-specific bootstrapping of singing voice structure. IEEE Int. Conf. Multimedia and Expo, 2004.Google ScholarCross Ref
- Timmers, R., and Desain, P. Vibrato: Questions and answers from musicians and science. in Proc. Int. Conf. Music Perception and Cognition, England, 2000.Google Scholar
- "Vibrato", Word of the Day. Answers Corporation, 2006. Answers.com 13 Dec. 2006. http://www.answers.com/topic/vibratoGoogle Scholar
- Wakefield, G. H. and Bartsch, M. A. Where's Caruso? Singer identification by listener and machine. Cambridge Music Processing Colloquium, Cambridge, England, 2003.Google Scholar
- Winckell, F. Music, sound and sensation. Dover, NY, 1967.Google Scholar
- Zhang, T. System and method for automatic singer identification. IEEE Int. Conf. Multimedia and Expo, Baltimore, MD, 2003. Google ScholarDigital Library
Index Terms
- Singing voice detection using perceptually-motivated features
Recommendations
Robust singer identification of Indian playback singers
Singing voice analysis has been a topic of research to assist several applications in the domain of music information retrieval system. One such major area is singer identification (SID). There has been enormous increase in production of movies and ...
Exploring Perceptual Based Timbre Feature for Singer Identification
Computer Music Modeling and Retrieval. Sense of SoundsTimbre can be defined as feature of an auditory stimulus that allows us to distinguish the sounds which have the same pitch and loudness. In this paper, we explore timbre based perceptual feature for singer identification. We start with a vocal ...
A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval
This paper describes a method of modeling the characteristics of a singing voice from polyphonic musical audio signals including sounds of various musical instruments. Because singing voices play an important role in musical pieces with vocals, such ...
Comments