Skip to main content

BaDumTss: Multi-task Learning for Beatbox Transcription

  • Conference paper
  • First Online:
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2022)

Abstract

The challenge of transcribing audio into symbolic notations is a well-known problem in music information retrieval. In this work, we explore a novel task – automatic music transcription for Beatbox sounds, also known as Vocal Percussions. As Beatbox sounds cannot be created in a synthetic manner, they inherently vary within the same speaker as well as across different speakers. To address this, we propose BaDumTss, which makes use of a pretraining strategy over a novel sequence traversal method, thereby ensuring robustness and efficiency against new Beatbox sequences. Furthermore, BaDumTss is agnostic to time-based stretches and warps, as well as amplitude changes in the Beatbox sequence. It predicts both onsets and frame-set in a multi-task manner while gaining a whopping 56% and 326% relative improvement frame-set and onset-level F1 scores over the best performing baseline respectively. We also release an annotated dataset of monophonic Beatbox sequences along with their corresponding MIDI labels, the first of its kind comprising Beatbox samples with different variations such as time-stretches, pitch shifts, and added noise.

P. Mehta and M. Maheshwari—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The source code and dataset are available at https://github.com/LCS2-IIITD/BaDumTss-PAKDD22.

References

  1. Cazau, D., Wang, Y., Adam, O., Wang, Q., Nuel, G.: Improving note segmentation in automatic piano music transcription systems with a two-state pitch-wise HMM method. In: ISMIR (2017)

    Google Scholar 

  2. Ishizuka, R., Nishikimi, R., Nakamura, E., Yoshii, K.: Tatum-level drum transcription based on a convolutional recurrent neural network with language model-based regularized training (2020)

    Google Scholar 

  3. Choi, K., Cho, K.: Deep unsupervised drum transcription (2019)

    Google Scholar 

  4. Southall, C., Stables, R., Hockman, S.: Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (2017). https://doi.org/10.5281/zenodo.1415616

  5. Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription (2018)

    Google Scholar 

  6. Wang, Y., Salamon, J., Cartwright, M., Bryan, N.J., Pablo Bello, J.: Few-shot drum transcription in polyphonic music. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, 11–16 October 2020

    Google Scholar 

  7. Callender, L., Hawthorne, C., Engel, J.: Improving perceptual quality of drum transcription with the expanded groove MIDI Dataset (2020)

    Google Scholar 

  8. Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)

    Google Scholar 

  9. Sinyor, E., McKay, C., Fiebrink, R., McEnnis, D., Fujinaga, F.: Beatbox classification using ACE. In: ISMIR (2005)

    Google Scholar 

  10. Picart, B., Brognaux, B., Dupont, S.: Analysis and automatic recognition of Human BeatBox sounds: a comparative study. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  11. Evain, S., et al.: Beatbox sounds recognition using a speech-dedicated HMM-GMM based system. In: Models and Analisys of Vocal Emission for Biomedical Applications Firenze, Italy (2019)

    Google Scholar 

  12. librosa/librosa: 0.8.0. https://doi.org/10.5281/zenodo.3955228

  13. Weng, W., et al.: U-Net: convolutional networks for biomedical image segmentation. IEEE Access 9, 16591–16603 (2015)

    Google Scholar 

  14. Pedersoli, F., Tzanetakis, G., Yi, K.M.: Improving music transcription by pre-stacking AU-Net. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  15. Cartwright, M., Bello, J.P.: Increasing drum transcription vocabulary using data synthesis. In: Proceedings of the International Conference on Digital Audio Effects (DAFx) (2018)

    Google Scholar 

  16. Cheuk, K.W., Agres, K., Herremans, D.: The impact of Audio input representations on neural network based music transcription. In: 2020 International Joint Conference on Neural Networks (IJCNN) (2020)

    Google Scholar 

  17. Kong, Q., Li, B., Song, X., Wan, Y., Wang, Y.: High-resolution Piano transcription with pedals by regressing onsets and offsets times (2020)

    Google Scholar 

  18. Jacques, C., Roebel, A.: Data augmentation for drum transcription with convolutional neural networks. In: 2019 27th European Signal Processing Conference (EUSIPCO) (2019)

    Google Scholar 

  19. Delgado, A., McDonald, S., Xu, N., Sandler, M.: A new dataset for amateur vocal percussion analysis. In: Proceedings of the 14th International Audio Mostly Conference: A Journey in Sound (2019)

    Google Scholar 

  20. Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP J. Adv. Signal Process. 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317

    Article  MATH  Google Scholar 

  21. Poliner, G.E., Ellis, D.P.W.: A Discriminative Model for Polyphonic Piano Transcription. EURASIP Journal on Advances in Signal Processing 2007(1), 1–9 (2007). https://doi.org/10.1155/2007/48317

    Article  MATH  Google Scholar 

  22. Vogl, R., Dorfer, M., Knees,P.: Drum transcription from polyphonic music with recurrent neural networks. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017). https://doi.org/10.1109/ICASSP.2017.7952146

  23. Vogl, R., Widmer, G., Knees, P.: Towards multi-instrument rum transcription. In: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx 2018), 4–8 September 2018, Aveiro, Portugal (2018)

    Google Scholar 

  24. Gillet, O., Richard, G.: ENST-drums: an extensive audio-visual database for drum signals processing. In: Proceedings of the 7th International Conference on Music Information Retrieval (2006). https://doi.org/10.5281/zenodo.1415902

  25. Emiya, V., Bertin, N., David, B., Badeau, R.: MAPS - a piano database for multipitch estimation and automatic transcription of music. Res. Rep. 11., 00544155 (2010)

    Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge the support of the Ramanujan Fellowship (SERB, India), Infosys Centre for AI (CAI) at IIIT-Delhi, and ihub-Anubhuti-iiitd Foundation set up under the NM-ICPS scheme of the Department of Science and Technology, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priya Mehta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mehta, P., Maheshwari, M., Joshi, B., Chakraborty, T. (2022). BaDumTss: Multi-task Learning for Beatbox Transcription. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13282. Springer, Cham. https://doi.org/10.1007/978-3-031-05981-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05981-0_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05980-3

  • Online ISBN: 978-3-031-05981-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics