PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks

Castel-Branco, Gonçalo; Falcao, Gabriel; Perdigão, Fernando

doi:10.1007/s11265-021-01661-3

PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks

Published: 01 April 2021

Volume 93, pages 977–987, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Gonçalo Castel-Branco¹,
Gabriel Falcao¹ &
Fernando Perdigão¹

489 Accesses
1 Citation
Explore all metrics

Abstract

Automatic classification of musical instruments from audio relies heavily on datasets of acoustic recordings of the instruments to train models of those instruments. To do this, precise labels of the instrument’s events are mandatory. Also, it is very difficult to obtain such labels, especially in polyphonic performances. OpenMic-2018 is a polyphonic dataset created specifically with the aim to train instrument models. However, this dataset is based on weak and incomplete labels. The automatic classification of sound events, based on the VGGish bottleneck layer as proposed before by the AudioSet, implies the classification of only one second at a time, making it hard to find the label of that exact moment. To answer this question, this paper proposes PureMIC, a new strongly labeled dataset (SLD) that isolates 1000 single instrument clips manually labeled. Moreover, the proposed model classifies clips over time and also enhances the labeling robustness of a high number of unlabeled samples in OpenMIC-2018 due to its ability of classification over time. In the paper we disambiguate and report the automatic labeling of previously unlabeled samples. The proposed new labels achieve a mean average precision (mAP) of 0.701 for OpenMIC test data, outperforming its baseline (0.66). The code is released online so that the research community can replicate and follow the proposed implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weakly Labeled Semi-Supervised Sound Event Detection Based on Convolutional Independent Recurrent Neural Networks

Article 01 September 2022

Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings

AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks

Article Open access 23 March 2023

Notes

References

Müller, M. (2015). Fundamentals of music processing. Berlin: Springer.
Book Google Scholar
Mcadams, S. (1993). Recognition of sound sources and events. In Thinking in sound: the cognitive psychology of human audition (pp. 146–198): Oxford University Press.
Takahashi, T., Fukayama, S., & Goto, M. (2018). Instrudive: A music visualization system based on automatically recognized instrumentation.
Herrera-Boyer, P, Peeters, G., & Dubnov, S. (2003). Automatic classification of musical instrument sounds. Journal of New Music Research, 32(1), 3–21.
Article Google Scholar
Lostanlen, V., Andén, J., & Lagrangé, M. (2018). Extended playing techniques: The next milestone in musical instrument recognition. arXiv:1808.09730v1.
Kumar, A., & Raj, B. (2016). Audio event detection using weakly labeled data. In Proc. of the ACM Multimedia Conference (MM 2016), pp 1038–1047, Association for Computing Machinery, Inc.
Kong, Q., Xu, Y., Wang, W., & Plumbley, M.D. (2017). A joint detection-classification model for audio tagging of weakly labelled data. In Proc. of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 641–645, IEEE.
Giannoulis, D., Benetos, E., Stowell, D., Rossignol, M., Lagrange, M., & Plumbley, M.D. (2013). Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 1–4).
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge,” Tech. Rep. 2, 2018.
Mesaros, A., Heittola, T., Diment, A., Elizalde, B., Shah, A., Vincent, E., Raj, B., & Virtanen, T. (2017). DCASE 2017 Challenge setup: tasks, datasets and baseline system.
Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., & Ritter, M. (2017). Audio Set: An ontology and human-labeled dataset for audio events. In Proc. of IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 776–780).
Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, JF., Jansen, A., Channing Moore, R., Plakal, M., Platt, D., Saurous, R.A., Seybold, B., Slaney, M., Weiss, R.J., & Wilson, K. (2017). CNN architectures for large-scale audio classification. In Proc. of IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 131–135).
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556v6.
Bro, R. (2014). Analytical methods principal component analysis. Royal Society Of Chemistry 2812–2831.
Humphrey, E.J., Durand, S., & Mcfee, B. (2018). OpenMIC-2018: An open dataset for multiple instrument recognition. In Proc. of the 19th international society for music information retrieval conference (ISMIR).
Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2016). FMA: A dataset for music analysis. 316–323.
Bandiera, G., Picas, O.R., & Serra, X. (2016). Good-sounds.org: a framework to explore goodness in instrumental sounds. In Proc. of the 17th international society for music information retrieval conference (ISMIR).
Engel, J., Resnick, C., Roberts, A., Dieleman, S., Norouzi, M., Eck, D., & Simonyan, K. (2017). Neural audio synthesis of musical notes with wavenet autoencoders, Tech. Rep.
Thickstun, J., Harchaoui, Z., & Kakade, S. (2016). Learning features of music from scratch. In 5th International conference on learning representations, ICLR 2017 - conference track proceedings.
Bittner, R., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research.
Irmas. (2020). IRMAS: A dataset for instrument recognition in musical audio signals - MTG - Music Technology Group (UPF), [online]. https://www.upf.edu/web/mtg/irmas, [Accessed Oct 2020.
Castel-Branco, G., Falcao, G., & Perdigão, F. (2020). Enhancing the labelling of audio samples for automatic instrument classification based on neural networks. In Proc. of IEEE international conference on acoustics, speech, and signal processing (ICASSP).
Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines. In Proc. of the 27th international conference on machine learning (ICML), pp 807–814.
Ruder, S. (2017). An overview of gradient descent optimization algorithms, Tech. Rep.
NVIDIA. (2020). The NVIDIA CUDA Deep Neural Network library (cuDNN), [online]. https://developer.nvidia.com/cudnn, [accessed Jun 2020].
Falcao, G., Silva, V., Sousa, L., & Andrade, J. (2012). Portable LDPC decoding on multicores using openCL. IEEE Signal Processing Magazine, 29(4), 81–109.
Article Google Scholar
KERAS. (2020). Keras GitHub repository, [online]. https://github.com/keras-team/keras, [accessed Oct 2020.
Abadi, M., Barham, P., Chen, J., Chen, Z. , Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., & et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th Symposium on Operating Systems Design and Implementation. pp. 265–283.
Mesaros, A., Heittola, T., & Virtanen, T. (2016). Metrics for polyphonic sound event detection. Applied Sciences (Switzerland), 6(6), 162.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12(85), 2825–2830.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was funded by Instituto de Telecomunicações and Fundação para a Ciência e a Tecnologia under grant UIDB/50008/2020.

Author information

Authors and Affiliations

Instituto de Telecomunicações, University of Coimbra, Coimbra, Portugal
Gonçalo Castel-Branco, Gabriel Falcao & Fernando Perdigão

Authors

Gonçalo Castel-Branco
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Falcao
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Perdigão
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gonçalo Castel-Branco.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castel-Branco, G., Falcao, G. & Perdigão, F. PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks. J Sign Process Syst 93, 977–987 (2021). https://doi.org/10.1007/s11265-021-01661-3

Download citation

Received: 24 June 2020
Revised: 02 November 2020
Accepted: 10 December 2020
Published: 01 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11265-021-01661-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Weakly Labeled Semi-Supervised Sound Event Detection Based on Convolutional Independent Recurrent Neural Networks

Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings

AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PureMIC: A New Audio Dataset for the Classification of Musical Instruments based on Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Weakly Labeled Semi-Supervised Sound Event Detection Based on Convolutional Independent Recurrent Neural Networks

Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings

AAM: a dataset of Artificial Audio Multitracks for diverse music information retrieval tasks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation