Abstract
The challenge of transcribing audio into symbolic notations is a well-known problem in music information retrieval. In this work, we explore a novel task – automatic music transcription for Beatbox sounds, also known as Vocal Percussions. As Beatbox sounds cannot be created in a synthetic manner, they inherently vary within the same speaker as well as across different speakers. To address this, we propose BaDumTss, which makes use of a pretraining strategy over a novel sequence traversal method, thereby ensuring robustness and efficiency against new Beatbox sequences. Furthermore, BaDumTss is agnostic to time-based stretches and warps, as well as amplitude changes in the Beatbox sequence. It predicts both onsets and frame-set in a multi-task manner while gaining a whopping 56% and 326% relative improvement frame-set and onset-level F1 scores over the best performing baseline respectively. We also release an annotated dataset of monophonic Beatbox sequences along with their corresponding MIDI labels, the first of its kind comprising Beatbox samples with different variations such as time-stretches, pitch shifts, and added noise.
P. Mehta and M. Maheshwari—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The source code and dataset are available at https://github.com/LCS2-IIITD/BaDumTss-PAKDD22.
References
Cazau, D., Wang, Y., Adam, O., Wang, Q., Nuel, G.: Improving note segmentation in automatic piano music transcription systems with a two-state pitch-wise HMM method. In: ISMIR (2017)
Ishizuka, R., Nishikimi, R., Nakamura, E., Yoshii, K.: Tatum-level drum transcription based on a convolutional recurrent neural network with language model-based regularized training (2020)
Choi, K., Cho, K.: Deep unsupervised drum transcription (2019)
Southall, C., Stables, R., Hockman, S.: Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (2017). https://doi.org/10.5281/zenodo.1415616
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription (2018)
Wang, Y., Salamon, J., Cartwright, M., Bryan, N.J., Pablo Bello, J.: Few-shot drum transcription in polyphonic music. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, 11–16 October 2020
Callender, L., Hawthorne, C., Engel, J.: Improving perceptual quality of drum transcription with the expanded groove MIDI Dataset (2020)
Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)
Sinyor, E., McKay, C., Fiebrink, R., McEnnis, D., Fujinaga, F.: Beatbox classification using ACE. In: ISMIR (2005)
Picart, B., Brognaux, B., Dupont, S.: Analysis and automatic recognition of Human BeatBox sounds: a comparative study. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Evain, S., et al.: Beatbox sounds recognition using a speech-dedicated HMM-GMM based system. In: Models and Analisys of Vocal Emission for Biomedical Applications Firenze, Italy (2019)
librosa/librosa: 0.8.0. https://doi.org/10.5281/zenodo.3955228
Weng, W., et al.: U-Net: convolutional networks for biomedical image segmentation. IEEE Access 9, 16591–16603 (2015)
Pedersoli, F., Tzanetakis, G., Yi, K.M.: Improving music transcription by pre-stacking AU-Net. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Cartwright, M., Bello, J.P.: Increasing drum transcription vocabulary using data synthesis. In: Proceedings of the International Conference on Digital Audio Effects (DAFx) (2018)
Cheuk, K.W., Agres, K., Herremans, D.: The impact of Audio input representations on neural network based music transcription. In: 2020 International Joint Conference on Neural Networks (IJCNN) (2020)
Kong, Q., Li, B., Song, X., Wan, Y., Wang, Y.: High-resolution Piano transcription with pedals by regressing onsets and offsets times (2020)
Jacques, C., Roebel, A.: Data augmentation for drum transcription with convolutional neural networks. In: 2019 27th European Signal Processing Conference (EUSIPCO) (2019)
Delgado, A., McDonald, S., Xu, N., Sandler, M.: A new dataset for amateur vocal percussion analysis. In: Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound (2019)
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP J. Adv. Signal Process. 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP Journal on Advances in Signal Processing 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317
Vogl, R., Dorfer, M., Knees,P.: Drum transcription from polyphonic music with recurrent neural networks. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017). https://doi.org/10.1109/ICASSP.2017.7952146
Vogl, R., Widmer, G., Knees, P.: Towards multi-instrument rum transcription. In: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx 2018), 4–8 September 2018, Aveiro, Portugal (2018)
Gillet, O., Richard, G.: ENST-drums: an extensive audio-visual database for drum signals processing. In: Proceedings of the 7th International Conference on Music Information Retrieval (2006). https://doi.org/10.5281/zenodo.1415902
Emiya, V., Bertin, N., David, B., Badeau, R.: MAPS - a piano database for multipitch estimation and automatic transcription of music. Res. Rep. 11., 00544155 (2010)
Acknowledgement
The authors would like to acknowledge the support of the Ramanujan Fellowship (SERB, India), Infosys Centre for AI (CAI) at IIIT-Delhi, and ihub-Anubhuti-iiitd Foundation set up under the NM-ICPS scheme of the Department of Science and Technology, India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mehta, P., Maheshwari, M., Joshi, B., Chakraborty, T. (2022). BaDumTss: Multi-task Learning for Beatbox Transcription. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-05981-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05980-3
Online ISBN: 978-3-031-05981-0
eBook Packages: Computer ScienceComputer Science (R0)