Abstract:
Transcribing the singing voice into music notes is challenging due to pitch fluctuations such as portamenti and vibratos. This paper presents a probabilistic transcriptio...Show MoreMetadata
Abstract:
Transcribing the singing voice into music notes is challenging due to pitch fluctuations such as portamenti and vibratos. This paper presents a probabilistic transcription method for monophonic sung melodies that explicitly accounts for these local pitch fluctuations. In the hierarchical Hidden Markov Model (HMM), an upper-level ergodic HMM handles the transitions between notes, and a lower-level left-to-right HMM handles the intra- and inter-note pitch fluctuations. The lower-level HMM employs the pitch dynamic model, which explicitly expresses the pitch curve characteristics as the observation likelihood over f0 and Δf0 using a compact parametric distribution. A histogram-based tuning frequency estimation method, and some post-processing heuristics to separate merged notes and to allocate spuriously detected short notes, improve the note recognition performance. With model parameters that support intuitions about singing behavior, the proposed method obtained encouraging results when evaluated on a published monophonic sung melody dataset, and compared with state-of-the-art methods.
Published in: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 05-09 March 2017
Date Added to IEEE Xplore: 19 June 2017
ISBN Information:
Electronic ISSN: 2379-190X