Music Genre Classification: Looking for the Perfect Network

Kostrzewa, Daniel; Kaminski, Piotr; Brzeski, Robert

doi:10.1007/978-3-030-77961-0_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Included in the following conference series:

International Conference on Computational Science

1779 Accesses
8 Citations

Abstract

This paper presents research on music genre recognition. It is a crucial task because there are millions of songs in the online databases. Classifying them by a human being is impossible or extremely expensive. As a result, it is desirable to create methods that can assign a given track to a music genre. Here, the classification of music tracks is carried out by deep learning models. The Free Music Archive dataset was used to perform experiments. The tests were executed with the usage of Convolutional Neural Network, Convolutional Recurrent Neural Networks with 1D and 2D convolutions, and Recurrent Neural Network with Long Short-Term Memory cells. In order to combine the advantages of different deep neural network architectures, a few types of ensembles were proposed with two types of results mixing methods. The best results obtained in this paper, which are equal to state-of-the-art methods, were achieved by one of the proposed ensembles. The solution described in the paper can help to make the auto-tagging of songs much faster and more accurate in the context of assigning them to particular musical genres.

This work was supported by Statutory Research funds of Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland (BKM21 – DK, BK 02/100/BK_21/0008 – RB).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Basili, R., Serafini, A., Stellato, A.: Classification of musical genre: a machine learning approach. In: ISMIR (2004)
Google Scholar
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
Google Scholar
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179 (2017)
Costa, Y.M., Oliveira, L.S., Silla, C.N., Jr.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)
Article Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: A dataset for music analysis. arXiv preprint arXiv:1612.01840 (2016)
Dong, M.: Convolutional neural network achieves human-level accuracy in music genre classification. arXiv preprint arXiv:1802.09697 (2018)
Ghosal, D., Kolekar, M.H.: Music genre recognition using deep neural networks and transfer learning. In: Interspeech, pp. 2087–2091 (2018)
Google Scholar
Gunawan, A.A., Suhartono, D., et al.: Music recommender system based on genre using convolutional recurrent neural networks. Procedia Comput. Sci. 157, 99–109 (2019)
Article Google Scholar
Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimedia 17(11), 2059–2071 (2015)
Article Google Scholar
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020)
Article Google Scholar
Kim, T., Lee, J., Nam, J.: Sample-level CNN architectures for music auto-tagging using raw waveforms. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 366–370. IEEE (2018)
Google Scholar
Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., Inman, D.J.: 1D convolutional neural networks and applications: A survey. arXiv preprint arXiv:1905.03554 (2019)
Kostrzewa, D., Brzeski, R., Kubanski, M.: The classification of music by the genre using the KNN classifier. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2018. CCIS, vol. 928, pp. 233–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99987-6_18
Chapter Google Scholar
Labach, A., Salehinejad, H., Valaee, S.: Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310 (2019)
Lee, D., Lee, J., Park, J., Lee, K.: Enhancing music features by knowledge transfer from user-item log data. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 386–390. IEEE (2019)
Google Scholar
Lee, J., Nam, J.: Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE Signal Process. Lett. 24(8), 1208–1212 (2017)
Article Google Scholar
Lim, M., et al.: Convolutional neural network based audio event classification. KSII Trans. Internet Inf. Syst. 12(6), 2748–2760 (2018)
Google Scholar
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Article Google Scholar
McKay, C., Fujinaga, I.: Musical genre classification: is it worth pursuing and how can it be improved? In: ISMIR, pp. 101–106 (2006)
Google Scholar
Mermelstein, P.: Distance measures for speech recognition, psychological and instrumental. Pattern Recogn. Artif. Intell. 116, 374–388 (1976)
Google Scholar
Mogran, N., Bourlard, H., Hermansky, H.: Automatic speech recognition: an auditory perspective. In: Speech Processing in the Auditory System. Springer Handbook of Auditory Research, vol. 18, pp. 309–338. Springer New York (2004). https://doi.org/10.1007/0-387-21575-1_6
Moska, B., Kostrzewa, D., Brzeski, R.: Influence of the applied outlier detection methods on the quality of classification. In: Gruca, A., Czachórski, T., Deorowicz, S., Hareżlak, K., Piotrowska, A. (eds.) ICMMI 2019. AISC, vol. 1061, pp. 77–88. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-31964-9_8
Nanni, L., Costa, Y.M., Aguiar, R.L., Silla, C.N., Jr., Brahnam, S.: Ensemble of deep learning, visual and acoustic features for music genre classification. J. New Music Res. 47(4), 383–397 (2018)
Article Google Scholar
Nanni, L., Maguolo, G., Brahnam, S., Paci, M.: An ensemble of convolutional neural networks for audio classification. arXiv preprint arXiv:2007.07966 (2020)
Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. arXiv preprint arXiv:1707.04916 (2017)
Pamina, J., Raja, B.: Survey on deep learning algorithms. Int. J. Emerg. Technol. Innov. Eng. 5(1), 38–43 (2019)
Google Scholar
Park, J., Lee, J., Park, J., Ha, J.W., Nam, J.: Representation learning of music using artist labels. arXiv preprint arXiv:1710.06648 (2017)
Pons, J., Serra, X.: Randomly weighted CNNs for (music) audio classification. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 336–340. IEEE (2019)
Google Scholar
Sahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012)
Article Google Scholar
Silla, C.N., Koerich, A.L., Kaestner, C.A.: A machine learning approach to automatic music genre classification. J. Braz. Comput. Soc. 14(3), 7–18 (2008)
Article Google Scholar
Snigdha, C., Kavitha, A.S., Shwetha, A.N., Shreya, H., Vidyullatha, K.S.: Music genre classification using machine learning algorithms: a comparison. Int. Res. J. Eng. Technol. 6(5), 851–858 (2019)
Google Scholar
Sola, J., Sevilla, J.: Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 44(3), 1464–1468 (1997)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds.) AMR 2012. LNCS, vol. 8382, pp. 29–66. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12093-5_2
Chapter Google Scholar
Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)
Article Google Scholar
Tang, C.P., Chui, K.L., Yu, Y.K., Zeng, Z., Wong, K.H.: Music genre classification using a hierarchical long short term memory (LSTM) model. In: Third International Workshop on Pattern Recognition, vol. 10828, p. 108281B. International Society for Optics and Photonics (2018)
Google Scholar
Urbano, J., Schedl, M., Serra, X.: Evaluation in music information retrieval. J. Intell. Inf. Syst. 41(3), 345–369 (2013)
Article Google Scholar
Wang, Z., Muknahallipatna, S., Fan, M., Okray, A., Lan, C.: Music classification using an improved CRNN with multi-directional spatial dependencies in both time and frequency dimensions. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Xu, M., Maddage, N.C., Xu, C., Kankanhalli, M., Tian, Q.: Creating audio keywords for event detection in soccer video. In: 2003 International Conference on Multimedia and Expo. ICME2003. Proceedings (Cat. No. 03TH8698), vol. 2, pp. II-281. IEEE (2003)
Google Scholar
Yi, Y., Chen, K.Y., Gu, H.Y.: Mixture of CNN experts from multiple acoustic feature domain for music genre classification. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1250–1255. IEEE (2019)
Google Scholar
Zhang, C., Zhang, Y., Chen, C.: SongNet: Real-Time Music Classification. Stanford University Press, Palo Alto (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa, Piotr Kaminski & Robert Brzeski

Authors

Daniel Kostrzewa
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Kaminski
View author publications
You can also search for this author in PubMed Google Scholar
Robert Brzeski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kostrzewa .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kostrzewa, D., Kaminski, P., Brzeski, R. (2021). Music Genre Classification: Looking for the Perfect Network. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-77961-0_6
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics