A sample-level DCNN for music auto-tagging

Yu, Yong-bin; Qi, Min-hui; Tang, Yi-fan; Deng, Quan-xin; Mai, Feng; Zhaxi, Nima

doi:10.1007/s11042-020-10330-9

A sample-level DCNN for music auto-tagging

Published: 06 January 2021

Volume 80, pages 11459–11469, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yong-bin Yu¹,
Min-hui Qi¹,
Yi-fan Tang¹,
Quan-xin Deng¹,
Feng Mai¹ &
…
Nima Zhaxi²

340 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Deep convolutional neural networks (DCNNs) has been widely used in music auto-tagging which is a multi-label classification task that predicts tags of audio signals. This paper presents a sample-level DCNN for music auto-tagging. The proposed DCNN highlights two components: strided convolutional layer for extracting local feature and reducing temporal dimension, and residual block from WaveNet for preserving input resolution and extracting more complex features. In order to further improve performance, squeeze-and-excitation (SE) block is introduced to the residual block. Under the evaluation metric of Area Under Receiver Operating Characteristic Curve (AUC-ROC) score, experiment results on MagnaTagATune (MTAT) dataset show that the two proposed models achieve 91.47% and 92.76% respectively. Furthermore, our proposed models have slightly surpass the state-of-the-art model SampleCNN with SE block.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Audio-Based Music Classification with DenseNet and Data Augmentation

Notes

https://github.com/qmh1234567/music-auto-tagging-by-sample-level-DCNN

References

Bertin-Mahieux T, Ellis DP, Whitman B, Lamere P (2011) The million song dataset in Proceedings of the 12th International Society for Music Information Retrieval Conference. ISMIR 2011, Miami
Choi K, Fazekas G, Sandler M, Cho K, Ieee (2017) Convolutional recurrent neural networks for music classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, pp 2392–2396
Dieleman S, Schrauwen B, Ieee (2014) End-to-end learning for music audio. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, pp 6964–6968
Fu Z, Lu G, Ting KM, Zhang D (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimed 13(2):303–319
Article Google Scholar
He K, Zhang X, Ren S, Sun J, Ieee (2016) Deep residual learning for image recognition in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, pp 770–778
Holschneider M, Kronland-Martinet R, Morlet J, Tchamitchian P, Combes JM, Grossman A (1989) A real-time algorithm for signal analysis with the help of the wavelet transform. Wavelets, Time-Frequency Methods and Phase Space 1:286
Hoshen Y, Weiss RJ, Wilson KW (2015) Speech acoustic modeling from raw multichannel waveforms. In 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), International Conference on Acoustics Speech and Signal Processing ICASSP, Brisbane, pp 4624–4628
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42:2011–2023
Kim T, Lee J, Nam J (2019) Comparison and analysis of sample cnn architectures for audio classification. IEEE Journal of Selected Topics in Signal Processing 13:285–297
Kumar A, Rajpal A, Rathore D (2018) Genre classification using feature extraction and deep learning techniques in 2018 10th International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, pp 175–180
Lee J, Nam J (2017) Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. Ieee Signal Processing Letters 24(8):1208–1212
Lee J, Park J, Kim KL, Nam J (2017) Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. ArXiv vol. abs/1703.01789
Lin Y, Chung C, Chen HH (2018) Playlist-based tag propagation for improving music auto-tagging in European Signal Processing Conference (EUSIPCO). European Signal Processing Conference, Rome, pp 2270–2274
Nam J, Choi K, Lee J, Chou S, Yang Y (2019) Deep learning for audio-based music classification and tagging teaching computers to distinguish rock from bach. IEEE Signal Processing Magazine 36(1):41–51
Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
Pons J, Lidy T, Serra X, Ieee (2016) Experimenting with musically motivated convolutional neural networks. In 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), Bucharest, pp 1–6
Rajanna AR, Aryafar K, Shokoufandeh A, Ptucha R (2015) Deep neural networks: A case study for music genre classification. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, pp 655–660
Song G, Wang Z, Han F, Ding S, Iqbal MA (2018) Music auto-tagging using deep Recurrent Neural . Neurocomputing 292:104–110
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Ulaganathan AS, Ramanna S (2019) Granular methods in automatic music genre classification: a case study. J Intell Inf Syst 52(1)85–105
van den Oord A, Kalchbrenner N, Vinyals O, Espeholt L, Graves A, Kavukcuoglu K (2016) Conditional image generation with pixelcnn decoders. Advances in Neural Information Processing Systems 29 (Nips 2016) 29
Zen H, Agiomyrgiannakis Y, Egberts N, Henderson F, Szczepaniak P (2016) Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices. 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), vols 1–5: Understanding Speech Processing in Humans and Machines, pp 2273–2277

Download references

Acknowledgments

This work is supported by Research Fund for International Young Scientists of National Nat-ural Science Foundation of China (NSFC Grant No.61550110248), Sichuan Science and Technology Program (Grant No.2019YFG0190) and Research on Sino-Tibetan multi-source information acquisition, fusion, data mining and its application (Grant No.H04W170186).

Author information

Authors and Affiliations

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
Yong-bin Yu, Min-hui Qi, Yi-fan Tang, Quan-xin Deng & Feng Mai
School of Information Science and Technology, Tibet University, Lasa, 850000, China
Nima Zhaxi

Authors

Yong-bin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Min-hui Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yi-fan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Quan-xin Deng
View author publications
You can also search for this author in PubMed Google Scholar
Feng Mai
View author publications
You can also search for this author in PubMed Google Scholar
Nima Zhaxi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min-hui Qi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Yb., Qi, Mh., Tang, Yf. et al. A sample-level DCNN for music auto-tagging. Multimed Tools Appl 80, 11459–11469 (2021). https://doi.org/10.1007/s11042-020-10330-9

Download citation

Received: 24 March 2020
Revised: 02 September 2020
Accepted: 22 December 2020
Published: 06 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10330-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sample-level DCNN for music auto-tagging

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Audio-Based Music Classification with DenseNet and Data Augmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A sample-level DCNN for music auto-tagging

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging

Music Auto-tagging Based on Attention Mechanism and Multi-label Classification

Audio-Based Music Classification with DenseNet and Data Augmentation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation