BaDumTss: Multi-task Learning for Beatbox Transcription

Mehta, Priya; Maheshwari, Meet; Joshi, Brihi; Chakraborty, Tanmoy

doi:10.1007/978-3-031-05981-0_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13282))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1481 Accesses

Abstract

The challenge of transcribing audio into symbolic notations is a well-known problem in music information retrieval. In this work, we explore a novel task – automatic music transcription for Beatbox sounds, also known as Vocal Percussions. As Beatbox sounds cannot be created in a synthetic manner, they inherently vary within the same speaker as well as across different speakers. To address this, we propose BaDumTss, which makes use of a pretraining strategy over a novel sequence traversal method, thereby ensuring robustness and efficiency against new Beatbox sequences. Furthermore, BaDumTss is agnostic to time-based stretches and warps, as well as amplitude changes in the Beatbox sequence. It predicts both onsets and frame-set in a multi-task manner while gaining a whopping 56% and 326% relative improvement frame-set and onset-level F1 scores over the best performing baseline respectively. We also release an annotated dataset of monophonic Beatbox sequences along with their corresponding MIDI labels, the first of its kind comprising Beatbox samples with different variations such as time-stretches, pitch shifts, and added noise.

P. Mehta and M. Maheshwari—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The source code and dataset are available at https://github.com/LCS2-IIITD/BaDu mTss-PAKDD22.

References

Cazau, D., Wang, Y., Adam, O., Wang, Q., Nuel, G.: Improving note segmentation in automatic piano music transcription systems with a two-state pitch-wise HMM method. In: ISMIR (2017)
Google Scholar
Ishizuka, R., Nishikimi, R., Nakamura, E., Yoshii, K.: Tatum-level drum transcription based on a convolutional recurrent neural network with language model-based regularized training (2020)
Google Scholar
Choi, K., Cho, K.: Deep unsupervised drum transcription (2019)
Google Scholar
Southall, C., Stables, R., Hockman, S.: Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (2017). https://doi.org/10.5281/zenodo.1415616
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription (2018)
Google Scholar
Wang, Y., Salamon, J., Cartwright, M., Bryan, N.J., Pablo Bello, J.: Few-shot drum transcription in polyphonic music. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, 11–16 October 2020
Google Scholar
Callender, L., Hawthorne, C., Engel, J.: Improving perceptual quality of drum transcription with the expanded groove MIDI Dataset (2020)
Google Scholar
Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)
Google Scholar
Sinyor, E., McKay, C., Fiebrink, R., McEnnis, D., Fujinaga, F.: Beatbox classification using ACE. In: ISMIR (2005)
Google Scholar
Picart, B., Brognaux, B., Dupont, S.: Analysis and automatic recognition of Human BeatBox sounds: a comparative study. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Google Scholar
Evain, S., et al.: Beatbox sounds recognition using a speech-dedicated HMM-GMM based system. In: Models and Analisys of Vocal Emission for Biomedical Applications Firenze, Italy (2019)
Google Scholar
librosa/librosa: 0.8.0. https://doi.org/10.5281/zenodo.3955228
Weng, W., et al.: U-Net: convolutional networks for biomedical image segmentation. IEEE Access 9, 16591–16603 (2015)
Google Scholar
Pedersoli, F., Tzanetakis, G., Yi, K.M.: Improving music transcription by pre-stacking AU-Net. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Google Scholar
Cartwright, M., Bello, J.P.: Increasing drum transcription vocabulary using data synthesis. In: Proceedings of the International Conference on Digital Audio Effects (DAFx) (2018)
Google Scholar
Cheuk, K.W., Agres, K., Herremans, D.: The impact of Audio input representations on neural network based music transcription. In: 2020 International Joint Conference on Neural Networks (IJCNN) (2020)
Google Scholar
Kong, Q., Li, B., Song, X., Wan, Y., Wang, Y.: High-resolution Piano transcription with pedals by regressing onsets and offsets times (2020)
Google Scholar
Jacques, C., Roebel, A.: Data augmentation for drum transcription with convolutional neural networks. In: 2019 27th European Signal Processing Conference (EUSIPCO) (2019)
Google Scholar
Delgado, A., McDonald, S., Xu, N., Sandler, M.: A new dataset for amateur vocal percussion analysis. In: Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound (2019)
Google Scholar
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP J. Adv. Signal Process. 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317
Article MATH Google Scholar
Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP Journal on Advances in Signal Processing 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317
Article MATH Google Scholar
Vogl, R., Dorfer, M., Knees,P.: Drum transcription from polyphonic music with recurrent neural networks. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017). https://doi.org/10.1109/ICASSP.2017.7952146
Vogl, R., Widmer, G., Knees, P.: Towards multi-instrument rum transcription. In: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx 2018), 4–8 September 2018, Aveiro, Portugal (2018)
Google Scholar
Gillet, O., Richard, G.: ENST-drums: an extensive audio-visual database for drum signals processing. In: Proceedings of the 7th International Conference on Music Information Retrieval (2006). https://doi.org/10.5281/zenodo.1415902
Emiya, V., Bertin, N., David, B., Badeau, R.: MAPS - a piano database for multipitch estimation and automatic transcription of music. Res. Rep. 11., 00544155 (2010)
Google Scholar

Download references

Acknowledgement

The authors would like to acknowledge the support of the Ramanujan Fellowship (SERB, India), Infosys Centre for AI (CAI) at IIIT-Delhi, and ihub-Anubhuti-iiitd Foundation set up under the NM-ICPS scheme of the Department of Science and Technology, India.

Author information

Authors and Affiliations

Department of CSE, IIIT-Delhi, Delhi, India
Priya Mehta, Meet Maheshwari, Brihi Joshi & Tanmoy Chakraborty

Authors

Priya Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Meet Maheshwari
View author publications
You can also search for this author in PubMed Google Scholar
Brihi Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Tanmoy Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priya Mehta .

Editor information

Editors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
João Gama
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Tianrui Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Yang Yu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Enhong Chen
JD iCity, JD Technology & JD Intelligent Cities Research, Beijing, China
Yu Zheng
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Fei Teng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mehta, P., Maheshwari, M., Joshi, B., Chakraborty, T. (2022). BaDumTss: Multi-task Learning for Beatbox Transcription. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-05981-0_14
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05980-3
Online ISBN: 978-3-031-05981-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BaDumTss: Multi-task Learning for Beatbox Transcription