Abstract
The objective of this study is to overcome the performance limitations of existing instrument recognition systems in a cost-effective manner. Identifying predominant instruments accurately is a critical problem in music information retrieval, and it directly affects the performance of various advanced techniques. To address this, we propose a novel instrument recognition system that integrates a fast search technique, named MagiaSearch, to discover reliable SpecAugment parameters applicable to instrument recognition and a deep net classifier, named MagiaClassifier, which uses Swin Transformer V2 as the backbone model. Our experiments demonstrate that MagiaSearch effectively searches for reliable SpecAugment parameters applied to log mel spectrograms of instrument audio, MagiaClassifier enhances the performance of instrument recognition systems, and combining MagiaSearch and MagiaClassifier, we achieve a significant accuracy of 88.76% for major instrument recognition tasks in 11 categories in the IRMAS dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)
Deng, J.D., Simmermacher, C., Cranefield, S.: A study on feature analysis for musical instrument classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(2), 429–438 (2008)
Eronen, A.: Comparison of features for musical instrument recognition. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), pp. 19–22. IEEE (2001)
Fanelli, A.M., Caponetti, L., Castellano, G., Buscicchio, C.A.: Content-based recognition of musical instruments. In: Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, pp. 361–364. IEEE (2004)
Gaido, M., Gangi, M.A.D., Negri, M., Turchi, M.: End-to-end speech-translation with knowledge distillation: Fbk@iwslt2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, IWSLT 2020, Online, 9–10 July 2020, pp. 80–88. Association for Computational Linguistics (2020)
Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the Interspeech 2021, pp. 571–575 (2021)
Gururani, S., Sharma, M., Lerch, A.: An attention mechanism for musical instrument recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 83–90 (2019)
Hidaka, S., Wakamiya, K., Kaburagi, T.: An investigation of the effectiveness of phase for audio classification. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 3708–3712. IEEE (2022)
Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 381–385. IEEE (2019)
Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence to sequence voice conversion. arXiv preprint arXiv:2001.01401 (2020)
Li, X., Zhang, Y., Zhuang, X., Liu, D.: Frame-level SpecAugment for deep convolutional neural networks in hybrid ASR systems. In: 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 209–214. IEEE (2021)
Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge Research Laboratory Technical Report Series CRL 4, 143 (1999)
Martin, K.D., Kim, Y.E.: Musical instrument identification: a pattern-recognition approach. J. Acoust. Soc. Am. 104(3), 1768 (1998)
Martin, K.D.: Sound-source recognition: a theory and computational model. Ph.D. thesis, Massachusetts Institute of Technology (1999)
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of the Interspeech 2019, pp. 2613–2617 (2019)
Racharla, K., Kumar, V., Jayant, C.B., Khairkar, A., Harish, P.: Predominant musical instrument classification based on spectral features. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 617–622. IEEE (2020)
Saeed, A., Grangier, D., Zeghidour, N.: Contrastive learning of general-purpose audio representations. In: Proceedings of the ICASSP, pp. 3875–3879. IEEE (2021)
Solanki, A., Pandey, S.: Music instrument recognition using deep convolutional neural networks. Int. J. Inf. Technol. 14, 1659–1668 (2019)
Wang, H., Zou, Y., Chong, D.: Acoustic scene classification with spectrogram processing strategies. In: Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (Full Virtual), 2–4 November 2020, pp. 210–214 (2020)
Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: Proceedings of the ICASSP, pp. 121–125. IEEE (2018)
Zeyer, A., Bahar, P., Irie, K., Schlüter, R., Ney, H.: A comparison of transformer and LSTM encoder decoder models for ASR. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 8–15. IEEE (2019)
Zhou, W., Michel, W., Irie, K., Kitza, M., Schlüter, R., Ney, H.: The RWTH ASR system for TED-LIUM release 2: improving hybrid hmm with SpecAugment. In: Proceedings of the ICASSP, pp. 7839–7843. IEEE (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, H., Li, Z., Xing, S., Gu, Z., Wang, B. (2023). Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-44198-1_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)