Abstract
In Hindustani classical music, notes and their different variations play an important role to arouse the aesthetic qualities of a rãga. Therefore, detection of notes is very much important to find out the different characteristics of a rãga, but the task is very much challenging due to presence of improvisations or ornamentations. In this work, melody contour is extracted from the music file using salience-based predominant melody extraction method. Initially, the notes were determined by optimizing the tolerance band and duration of notes. The output of initial note transcription system consists of notes, its duration, and their boundaries (onset and offset instants). For improving the accuracy of initial note transcription, the melody contour is divided into melodic segments, and categorized into four broad categories based on duration and transition characteristics of the initial transcribed notes. Different features and classification models have been explored for classifying the melodic segments into desired and undesired categories. Further, we further proposed two metrics to measure the performance of the proposed transcription system.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio, Speech, and Language Processing, 21(3), 520–530.
Bagchee, S. (1998). NĀD: Understanding rāga music. Girgaon: Ceshwar, ISBN 81-86982-07-8.
Benetos, E., & Dixon, S. (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.
Cancela, P. (2008). Tracking melody in polyphonic audio. MIREX 2008. Proceedings of Music Information Retrieval Evaluation eXchange (MIREX).
De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.
Dighe, P., Karnick, H., & Raj, B. (2013). Swara histogram based structural analysis and identification of Indian classical ragas. In Proceedings of the 14th International Society of Music Information Retrieval Conference (ISMIR), Brazil (pp. 35–40). Curitiba: ISMIR.
Durrieu, J.-L., Richard, G., David, B., & Févotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 564–575.
Gong, R., Yang, Y., & Serra, X. (2016). Pitch contour segmentation for computer-aided jingju singing training. In Proceedings of the 13th Sound and Music Computing Conference, Germany (pp. 172–178). Hamburg: Hochschule fur Musik und Theater Hamburg.
Goto, M. (2004). A real-time music-scene-description system: Predominant-f0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4), 311–329.
Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In International Computer Music Conference (ICMC), Greece (pp. 1062–1068). Athens: ICMC.
Huang, P.-S., Chen, S. D., Smaragdis, P., & Hasegawa-Johnson, M. (2012). Singing-voice separation from monaural recordings using robust principal component analysis. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), Japan (pp. 57–60), Kyoto: ICASSP.
Koduri, G. K., Gulati, S., Rao, P., & Serra, X. (2012). Rāga recognition based on pitch distribution methods. Journal of New Music Research, 41(4), 337–350.
Mauch, M., Cannam, C., Bittner, R., Fazekas, G., Salamon, J., Dai, J., et al. (2015). Computer-aided melody note transcription using the Tony software: Accuracy and efficiency. In Proceedings of the First International Conference on Technologies for Music Notation and Representation, France (p. 8). Paris: TENOR.
Mauch, M., & Dixon, S. (2014). PYIN: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Italy (pp. 659–663). Florence: ICASSP.
Miryala, S. S., Bali, K., Bhagwan, R., & Choudhury, M. (2013). Automatically identifying vocal expressions for music transcription. In ISMIR, Brazil (pp. 239–244). Curitiba: ISMIR.
Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, pp. 1–8.
Obaidullah, S. M., Bose, A., Mukherjee, H., Santosh, K., Das, N., & Roy, K. (2018). Extreme learning machine for handwritten Indic script identification in multiscript documents. Journal of Electronic Imaging, 27(5), 051214.
Pandey, G., Mishra, C., & Ipe, P. (2003). Tansen: A system for automatic raga identification. In Indian International Conference on Artificial Intelligence (IICAI), India (pp. 1350–1363). Hyderabad: IICAI.
Poliner, G. E., Ellis, D. P., Ehmann, A. F., Gómez, E., Streich, S., & Ong, B. (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1247–1256.
Pratyush, (2010). Analysis and classification of ornaments in north Indian (Hindustani) classical music. Master’s thesis, Universitat Pompeu Fabra, Spain.
Rafii, Z., & Pardo, B. (2012). Music/voice separation using the similarity matrix. In International Society of Music Information Retrieval Conference (ISMIR), Portugal (pp. 583–588). Porto: ISMIR.
Rao, K. S., Saroj, V., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38(10), 13 181–13 185.
Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21(2), 282–295.
Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for indian languages. Computer Speech & Language, 23(2), 240–256.
Rao, P. (2012). Audio metadata extraction: The case for Hindustani classical music. In International Conference on Signal Processing and Communications (SPCOM), India (pp. 1–5). Bangalore: IEEE.
Rao, P., Ross, J. C., Ganguli, K. K., Pandit, V., Ishwar, V., Bellur, A., et al. (2014). Classification of melodic motifs in raga music with time-series matching. Journal of New Music Research, 43(1), 115–131.
Ryynänen, M. P., & Klapuri, A. P. (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.
Salamon, J., & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Samsekai Manjabhat, S., Koolagudi, S. G., Rao, K., & Ramteke, P. B. (2017). Raga and tonic identification in Carnatic music. Journal of New Music Research, 46(3), 229–245.
Shetty, S., & Achary, K. (2009). Raga mining of Indian music by extracting arohana-avarohana pattern. International Journal of Recent Trends in Engineering, 1(1), 362–366.
Sjölander, K., & Beskow, J. (2000). Wavesurfer-an open source speech tool. In Proceedings of International Conference on Spoken Language Processing, China (pp. 464–467). Beijing, ICSLP.
Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), USA (pp. 425–428). Texas: ICASSP.
Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. In International Conference on Recent Trends in Image Processing and Pattern Recognition, India (pp. 185–193). Bidar: Springer.
Vidwans, A., Ganguli, K. K., & Rao, P. (2012). Classification of Indian classical vocal styles from melodic contours. In X. Serra, P. Rao, H. Murthy, B. Bozkurt (Eds.), Proceedings of the 2nd CompMusic Workshop, Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dhara, P., Rao, K.S. Automatic note transcription system for Hindustani classical music. Int J Speech Technol 21, 987–1003 (2018). https://doi.org/10.1007/s10772-018-9554-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9554-1