Skip to main content

Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Abstract

The objective of this study is to overcome the performance limitations of existing instrument recognition systems in a cost-effective manner. Identifying predominant instruments accurately is a critical problem in music information retrieval, and it directly affects the performance of various advanced techniques. To address this, we propose a novel instrument recognition system that integrates a fast search technique, named MagiaSearch, to discover reliable SpecAugment parameters applicable to instrument recognition and a deep net classifier, named MagiaClassifier, which uses Swin Transformer V2 as the backbone model. Our experiments demonstrate that MagiaSearch effectively searches for reliable SpecAugment parameters applied to log mel spectrograms of instrument audio, MagiaClassifier enhances the performance of instrument recognition systems, and combining MagiaSearch and MagiaClassifier, we achieve a significant accuracy of 88.76% for major instrument recognition tasks in 11 categories in the IRMAS dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)

    Google Scholar 

  2. Deng, J.D., Simmermacher, C., Cranefield, S.: A study on feature analysis for musical instrument classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(2), 429–438 (2008)

    Article  Google Scholar 

  3. Eronen, A.: Comparison of features for musical instrument recognition. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), pp. 19–22. IEEE (2001)

    Google Scholar 

  4. Fanelli, A.M., Caponetti, L., Castellano, G., Buscicchio, C.A.: Content-based recognition of musical instruments. In: Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, pp. 361–364. IEEE (2004)

    Google Scholar 

  5. Gaido, M., Gangi, M.A.D., Negri, M., Turchi, M.: End-to-end speech-translation with knowledge distillation: Fbk@iwslt2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, IWSLT 2020, Online, 9–10 July 2020, pp. 80–88. Association for Computational Linguistics (2020)

    Google Scholar 

  6. Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the Interspeech 2021, pp. 571–575 (2021)

    Google Scholar 

  7. Gururani, S., Sharma, M., Lerch, A.: An attention mechanism for musical instrument recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 83–90 (2019)

    Google Scholar 

  8. Hidaka, S., Wakamiya, K., Kaburagi, T.: An investigation of the effectiveness of phase for audio classification. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 3708–3712. IEEE (2022)

    Google Scholar 

  9. Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 381–385. IEEE (2019)

    Google Scholar 

  10. Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence to sequence voice conversion. arXiv preprint arXiv:2001.01401 (2020)

  11. Li, X., Zhang, Y., Zhuang, X., Liu, D.: Frame-level SpecAugment for deep convolutional neural networks in hybrid ASR systems. In: 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 209–214. IEEE (2021)

    Google Scholar 

  12. Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)

    Google Scholar 

  13. Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge Research Laboratory Technical Report Series CRL 4, 143 (1999)

    Google Scholar 

  14. Martin, K.D., Kim, Y.E.: Musical instrument identification: a pattern-recognition approach. J. Acoust. Soc. Am. 104(3), 1768 (1998)

    Article  Google Scholar 

  15. Martin, K.D.: Sound-source recognition: a theory and computational model. Ph.D. thesis, Massachusetts Institute of Technology (1999)

    Google Scholar 

  16. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of the Interspeech 2019, pp. 2613–2617 (2019)

    Google Scholar 

  17. Racharla, K., Kumar, V., Jayant, C.B., Khairkar, A., Harish, P.: Predominant musical instrument classification based on spectral features. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 617–622. IEEE (2020)

    Google Scholar 

  18. Saeed, A., Grangier, D., Zeghidour, N.: Contrastive learning of general-purpose audio representations. In: Proceedings of the ICASSP, pp. 3875–3879. IEEE (2021)

    Google Scholar 

  19. Solanki, A., Pandey, S.: Music instrument recognition using deep convolutional neural networks. Int. J. Inf. Technol. 14, 1659–1668 (2019)

    Google Scholar 

  20. Wang, H., Zou, Y., Chong, D.: Acoustic scene classification with spectrogram processing strategies. In: Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (Full Virtual), 2–4 November 2020, pp. 210–214 (2020)

    Google Scholar 

  21. Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: Proceedings of the ICASSP, pp. 121–125. IEEE (2018)

    Google Scholar 

  22. Zeyer, A., Bahar, P., Irie, K., Schlüter, R., Ney, H.: A comparison of transformer and LSTM encoder decoder models for ASR. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 8–15. IEEE (2019)

    Google Scholar 

  23. Zhou, W., Michel, W., Irie, K., Kitza, M., Schlüter, R., Ney, H.: The RWTH ASR system for TED-LIUM release 2: improving hybrid hmm with SpecAugment. In: Proceedings of the ICASSP, pp. 7839–7843. IEEE (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binhui Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, H., Li, Z., Xing, S., Gu, Z., Wang, B. (2023). Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44198-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44197-4

  • Online ISBN: 978-3-031-44198-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics