Abstract
In this paper, a time-domain adaptive filtering-based melody extraction method is proposed. The proposed method works in multiple stages to extract the vocal melody (singer’s fundamental frequency) from vocal polyphonic music signals. The vocal and non-vocal regions of the music signal are identified by the strength of excitation of the source signal. The vocal regions are further segmented into the sequence of notes by detecting their onsets in the frequency representation of the composite signal. The melody contour in each of the vocal note segment is obtained by adaptive zero-frequency filtering in the time domain. The performance of the proposed melody extraction method is compared with the current state-of-the-art melody extraction method in respect of voicing recall rate, voicing false alarm rate, raw pitch, and overall accuracy.
Similar content being viewed by others
References
V. Arora, L. Behera, On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Trans. Audio Speech Lang. Process. 21(3), 520–530 (2013)
J.P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M.B. Sandler, A tutorial on onset detection in music signals. IEEE Trans. Audio Speech Lang. Process. 13(5), 1035–1047 (2005)
S. Böck, F. Krebs, M. Schedl, Evaluating the online capabilities of onset detection methods, in ISMIR, pp. 49–54 (2012)
J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
P. Cancela, Tracking melody in polyphonic audio. mirex 2008, in Proceedings of Music Information Retrieval Evaluation eXchange (2008)
S. Dixon, Onset detection revisited, in Proceedings of the International Confernce on Digital Audio Effects (DAFx-06), pp. 133–137 (2006)
K. Dressler, Sinusoidal extraction using an efficient implementation of a multi-resolution FFT, in Proceedings of 9th International Conference on Digital Audio Effects (DAFx), pp. 247–252 (2006)
J.L. Durrieu, G. Richard, B. David, C. Févotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process. 18(3), 564–575 (2010)
C. Duxbury, M. Sandler, M. Davies, A hybrid approach to musical note onset detection, in Proceedings of Digital Audio Effects Conference (DAFX) pp. 33–38 (2002)
J. Eggink, G.J. Brown, Extracting Melody Lines From Complex Audio, ISMIR (2004)
M. Goto, A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Commun. 43(4), 311–329 (2004)
D.W. Griffin, J.S. Lim, Multiband excitation vocoder. IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223–1235 (1988)
C.-L. Hsu, J.-S. R. Jang, Singing Pitch Extraction by Voice Vibrato/Tremolo Estimation and Instrument Partial Deletion. ISMIR, pp. 525–530 (2010)
P.S. Huang, S.D. Chen, P. Smaragdis, H.-J. Mark, Singing-voice separation from monaural recordings using robust principal component analysis, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60 (2012)
S. Jo, S. Joo, C.D. Yoo, Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model. INTERSPEECH, pp. 2902–2905 (2010)
P. Leveau, L. Daudet, Methodology and tools for the evaluation of automatic onset detection algorithms in music, in Proceeding International Symposium on Music Information Retrieval (2004)
A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard, Adaptive filtering for music/voice separation exploiting the repeating musical structure, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 53–56 (2012)
H. Madden, Comments on smoothing and differentiation of data by simplified least square procedure. Anal. Chem. 50(9), 1383–86 (1978)
R.C. Maher, J.W. Beauchamp, Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am. 95(4), 2254–2263 (1994)
B.C.J. Moore, An Introduction to the Psychology of Hearing (Brill, Leiden, 2012)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
N. Ono, K. Miyamoto, H. Kameoka, J. Le Roux, Y. Uchiyama, E. Tsunoo, T. Nishimoto, S. Sagayama, Harmonic and percussive sound separation and its application to MIR-related tasks, in Advances in music information retrieval (Springer, 2010), pp. 213–236
R.P. Paiva, T. Mendes, A. Cardoso, Melody detection in polyphonic musical signals: exploiting perceptual rules, note salience, and melodic smoothness. Comput. Music J. 30(4), 80–98 (2006)
G.E. Poliner, D.P.W. Ellis, A.F. Ehmann, E. Gómez, S. Streich, B. Ong, Melody transcription from music audio: approaches and evaluation. IEEE Trans. Audio Speech Lang. Process. 15(4), 1247–1256 (2007)
Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)
V. Rao, P. Rao, Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech Lang. Process. 18(8), 2145–2154 (2010)
M.G. Reddy, K. Sreenivasa, Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 455–459 (2016)
G. Reddy, K.S. Rao, Enhanced harmonic content and vocal note based predominant melody extraction from vocal polyphonic music signals, in INTERSPEECH, pp. 3309–3313 (2016)
G. Reddy, K.S. Rao, Predominant vocal melody extraction from enhanced partial harmonic content, in 25th European Signal Processing Conference (EUSIPCO), pp. 1016–1020 (2017)
D.W. Robinson, R.S. Dadson, A re-determination of the equal-loudness relations for pure tones. Br. J. Appl. Phys. 7(5), 166 (1956)
M.P. Ryynänen, A.P. Klapuri, Automatic transcription of melody, bass line, and chords in polyphonic music. Comput. Music J. 32(3), 72–86 (2008)
J. Salamon, E. Gómez, Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans. Audio Speech Lang. Process. 20(6), 1759–1770 (2012)
J. Salamon, E. Gomez, D.P.W. Ellis, G. Richard, Melody extraction from polyphonic music signals: approaches, applications, and challenges. IEEE Signal Process. Mag. 31(2), 118–134 (2014)
J. Salamon, Melody extraction from polyphonic music signals. Ph. D. thesis, Department of Information and Communication Technologies Universitat Pompeu Fabra, Barcelona, Spain (2013)
E.D. Scheirer, Machine-listening systems. Unpublished Ph.D. Thesis, Massachusetts Institute of Technology (2000)
B. Scherrer, P. Depalle, Onset time estimation for the analysis of percussive sounds using exponentially damped sinusoids, in Proceedings of the 17th International Conference on Digital Audio Effects (DAFx), pp. 211–217 (2014)
J. Sundberg, T.D. Rossing, The science of singing voice. J. Acoust. Soc. Am. 87(1), 462–463 (1990)
H. Tachibana, T. Ono, N. Ono, S. Sagayama, Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source, in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 425–428 (2010)
T.-C. Yeh, M.-J. Wu, J.-S.R. Jang, W.-L. Chang, I.-B. Liao, A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 457–460 (2012)
Acknowledgements
The present work is carried out under the project entitled “Scientific Approach to Networking and Designing of Heritage Interfaces (SANDHI)” sponsored by Ministry of Human Resource Development (MHRD), Govt. of India. Project reference IIT/SRIC/R/ITA/2014/40, dated March 24, 2014. We would like to thank Google (Google PhD Fellowship) and Department of Information Technology (DIT), Govt. of India for financial support. We would also like to thank Prof. Pallab Das Gupta (Dept. of Computer Science and Engineering, IIT Kharagpur), Prof. Priyadarshi Patnaik (Dept. of Humanities, IIT Kharagpur), and Ms. Gowri (Professional Hindustani music vocalist) for providing us the more theoretical insight into the Hindustani Music.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gurunath Reddy, M., Sreenivasa Rao, K. Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method. Circuits Syst Signal Process 37, 2911–2933 (2018). https://doi.org/10.1007/s00034-017-0696-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-017-0696-1