Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier

Zhou, Hao; Li, Zhen; Xing, Shusong; Gu, Zujun; Wang, Binhui

doi:10.1007/978-3-031-44198-1_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14261))

Included in the following conference series:

International Conference on Artificial Neural Networks

978 Accesses

Abstract

The objective of this study is to overcome the performance limitations of existing instrument recognition systems in a cost-effective manner. Identifying predominant instruments accurately is a critical problem in music information retrieval, and it directly affects the performance of various advanced techniques. To address this, we propose a novel instrument recognition system that integrates a fast search technique, named MagiaSearch, to discover reliable SpecAugment parameters applicable to instrument recognition and a deep net classifier, named MagiaClassifier, which uses Swin Transformer V2 as the backbone model. Our experiments demonstrate that MagiaSearch effectively searches for reliable SpecAugment parameters applied to log mel spectrograms of instrument audio, MagiaClassifier enhances the performance of instrument recognition systems, and combining MagiaSearch and MagiaClassifier, we achieve a significant accuracy of 88.76% for major instrument recognition tasks in 11 categories in the IRMAS dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Application of Neural Architecture Search to Instrument Recognition in Polyphonic Audio

Exploiting cepstral coefficients and CNN for efficient musical instrument classification

Article 14 September 2023

Predominant Musical Instrument Identification Using Deep Hybrid Neural Networks

References

Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012)
Google Scholar
Deng, J.D., Simmermacher, C., Cranefield, S.: A study on feature analysis for musical instrument classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(2), 429–438 (2008)
Article Google Scholar
Eronen, A.: Comparison of features for musical instrument recognition. In: Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575), pp. 19–22. IEEE (2001)
Google Scholar
Fanelli, A.M., Caponetti, L., Castellano, G., Buscicchio, C.A.: Content-based recognition of musical instruments. In: Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, pp. 361–364. IEEE (2004)
Google Scholar
Gaido, M., Gangi, M.A.D., Negri, M., Turchi, M.: End-to-end speech-translation with knowledge distillation: Fbk@iwslt2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, IWSLT 2020, Online, 9–10 July 2020, pp. 80–88. Association for Computational Linguistics (2020)
Google Scholar
Gong, Y., Chung, Y.A., Glass, J.: AST: audio spectrogram transformer. In: Proceedings of the Interspeech 2021, pp. 571–575 (2021)
Google Scholar
Gururani, S., Sharma, M., Lerch, A.: An attention mechanism for musical instrument recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 83–90 (2019)
Google Scholar
Hidaka, S., Wakamiya, K., Kaburagi, T.: An investigation of the effectiveness of phase for audio classification. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 3708–3712. IEEE (2022)
Google Scholar
Hung, Y.N., Chen, Y.A., Yang, Y.H.: Multitask learning for frame-level instrument recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 381–385. IEEE (2019)
Google Scholar
Hwang, Y., Cho, H., Yang, H., Won, D.O., Oh, I., Lee, S.W.: Mel-spectrogram augmentation for sequence to sequence voice conversion. arXiv preprint arXiv:2001.01401 (2020)
Li, X., Zhang, Y., Zhuang, X., Liu, D.: Frame-level SpecAugment for deep convolutional neural networks in hybrid ASR systems. In: 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 209–214. IEEE (2021)
Google Scholar
Liu, Z., et al.: Swin transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12009–12019 (2022)
Google Scholar
Marques, J., Moreno, P.J.: A study of musical instrument classification using Gaussian mixture models and support vector machines. Cambridge Research Laboratory Technical Report Series CRL 4, 143 (1999)
Google Scholar
Martin, K.D., Kim, Y.E.: Musical instrument identification: a pattern-recognition approach. J. Acoust. Soc. Am. 104(3), 1768 (1998)
Article Google Scholar
Martin, K.D.: Sound-source recognition: a theory and computational model. Ph.D. thesis, Massachusetts Institute of Technology (1999)
Google Scholar
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of the Interspeech 2019, pp. 2613–2617 (2019)
Google Scholar
Racharla, K., Kumar, V., Jayant, C.B., Khairkar, A., Harish, P.: Predominant musical instrument classification based on spectral features. In: 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 617–622. IEEE (2020)
Google Scholar
Saeed, A., Grangier, D., Zeghidour, N.: Contrastive learning of general-purpose audio representations. In: Proceedings of the ICASSP, pp. 3875–3879. IEEE (2021)
Google Scholar
Solanki, A., Pandey, S.: Music instrument recognition using deep convolutional neural networks. Int. J. Inf. Technol. 14, 1659–1668 (2019)
Google Scholar
Wang, H., Zou, Y., Chong, D.: Acoustic scene classification with spectrogram processing strategies. In: Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), Tokyo, Japan (Full Virtual), 2–4 November 2020, pp. 210–214 (2020)
Google Scholar
Xu, Y., Kong, Q., Wang, W., Plumbley, M.D.: Large-scale weakly supervised audio classification using gated convolutional neural network. In: Proceedings of the ICASSP, pp. 121–125. IEEE (2018)
Google Scholar
Zeyer, A., Bahar, P., Irie, K., Schlüter, R., Ney, H.: A comparison of transformer and LSTM encoder decoder models for ASR. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 8–15. IEEE (2019)
Google Scholar
Zhou, W., Michel, W., Irie, K., Kitza, M., Schlüter, R., Ney, H.: The RWTH ASR system for TED-LIUM release 2: improving hybrid hmm with SpecAugment. In: Proceedings of the ICASSP, pp. 7839–7843. IEEE (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Software, Nankai University, Tianjin, China
Hao Zhou, Zhen Li, Shusong Xing, Zujun Gu & Binhui Wang

Authors

Hao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Shusong Xing
View author publications
You can also search for this author in PubMed Google Scholar
Zujun Gu
View author publications
You can also search for this author in PubMed Google Scholar
Binhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binhui Wang .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, H., Li, Z., Xing, S., Gu, Z., Wang, B. (2023). Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14261. Springer, Cham. https://doi.org/10.1007/978-3-031-44198-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-44198-1_11
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44197-4
Online ISBN: 978-3-031-44198-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Boost Predominant Instrument Recognition Performance with MagiaSearch and MagiaClassifier