Abstract
This paper presents research on music genre recognition. It is a crucial task because there are millions of songs in the online databases. Classifying them by a human being is impossible or extremely expensive. As a result, it is desirable to create methods that can assign a given track to a music genre. Here, the classification of music tracks is carried out by deep learning models. The Free Music Archive dataset was used to perform experiments. The tests were executed with the usage of Convolutional Neural Network, Convolutional Recurrent Neural Networks with 1D and 2D convolutions, and Recurrent Neural Network with Long Short-Term Memory cells. In order to combine the advantages of different deep neural network architectures, a few types of ensembles were proposed with two types of results mixing methods. The best results obtained in this paper, which are equal to state-of-the-art methods, were achieved by one of the proposed ensembles. The solution described in the paper can help to make the auto-tagging of songs much faster and more accurate in the context of assigning them to particular musical genres.
This work was supported by Statutory Research funds of Department of Applied Informatics, Silesian University of Technology, Gliwice, Poland (BKM21 – DK, BK 02/100/BK_21/0008 – RB).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Basili, R., Serafini, A., Stellato, A.: Classification of musical genre: a machine learning approach. In: ISMIR (2004)
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179 (2017)
Costa, Y.M., Oliveira, L.S., Silla, C.N., Jr.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: A dataset for music analysis. arXiv preprint arXiv:1612.01840 (2016)
Dong, M.: Convolutional neural network achieves human-level accuracy in music genre classification. arXiv preprint arXiv:1802.09697 (2018)
Ghosal, D., Kolekar, M.H.: Music genre recognition using deep neural networks and transfer learning. In: Interspeech, pp. 2087–2091 (2018)
Gunawan, A.A., Suhartono, D., et al.: Music recommender system based on genre using convolutional recurrent neural networks. Procedia Comput. Sci. 157, 99–109 (2019)
Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimedia 17(11), 2059–2071 (2015)
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020)
Kim, T., Lee, J., Nam, J.: Sample-level CNN architectures for music auto-tagging using raw waveforms. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 366–370. IEEE (2018)
Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., Inman, D.J.: 1D convolutional neural networks and applications: A survey. arXiv preprint arXiv:1905.03554 (2019)
Kostrzewa, D., Brzeski, R., Kubanski, M.: The classification of music by the genre using the KNN classifier. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2018. CCIS, vol. 928, pp. 233–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99987-6_18
Labach, A., Salehinejad, H., Valaee, S.: Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310 (2019)
Lee, D., Lee, J., Park, J., Lee, K.: Enhancing music features by knowledge transfer from user-item log data. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 386–390. IEEE (2019)
Lee, J., Nam, J.: Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE Signal Process. Lett. 24(8), 1208–1212 (2017)
Lim, M., et al.: Convolutional neural network based audio event classification. KSII Trans. Internet Inf. Syst. 12(6), 2748–2760 (2018)
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
McKay, C., Fujinaga, I.: Musical genre classification: is it worth pursuing and how can it be improved? In: ISMIR, pp. 101–106 (2006)
Mermelstein, P.: Distance measures for speech recognition, psychological and instrumental. Pattern Recogn. Artif. Intell. 116, 374–388 (1976)
Mogran, N., Bourlard, H., Hermansky, H.: Automatic speech recognition: an auditory perspective. In: Speech Processing in the Auditory System. Springer Handbook of Auditory Research, vol. 18, pp. 309–338. Springer New York (2004). https://doi.org/10.1007/0-387-21575-1_6
Moska, B., Kostrzewa, D., Brzeski, R.: Influence of the applied outlier detection methods on the quality of classification. In: Gruca, A., Czachórski, T., Deorowicz, S., Hareżlak, K., Piotrowska, A. (eds.) ICMMI 2019. AISC, vol. 1061, pp. 77–88. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-31964-9_8
Nanni, L., Costa, Y.M., Aguiar, R.L., Silla, C.N., Jr., Brahnam, S.: Ensemble of deep learning, visual and acoustic features for music genre classification. J. New Music Res. 47(4), 383–397 (2018)
Nanni, L., Maguolo, G., Brahnam, S., Paci, M.: An ensemble of convolutional neural networks for audio classification. arXiv preprint arXiv:2007.07966 (2020)
Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. arXiv preprint arXiv:1707.04916 (2017)
Pamina, J., Raja, B.: Survey on deep learning algorithms. Int. J. Emerg. Technol. Innov. Eng. 5(1), 38–43 (2019)
Park, J., Lee, J., Park, J., Ha, J.W., Nam, J.: Representation learning of music using artist labels. arXiv preprint arXiv:1710.06648 (2017)
Pons, J., Serra, X.: Randomly weighted CNNs for (music) audio classification. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 336–340. IEEE (2019)
Sahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012)
Silla, C.N., Koerich, A.L., Kaestner, C.A.: A machine learning approach to automatic music genre classification. J. Braz. Comput. Soc. 14(3), 7–18 (2008)
Snigdha, C., Kavitha, A.S., Shwetha, A.N., Shreya, H., Vidyullatha, K.S.: Music genre classification using machine learning algorithms: a comparison. Int. Res. J. Eng. Technol. 6(5), 851–858 (2019)
Sola, J., Sevilla, J.: Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 44(3), 1464–1468 (1997)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds.) AMR 2012. LNCS, vol. 8382, pp. 29–66. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12093-5_2
Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)
Tang, C.P., Chui, K.L., Yu, Y.K., Zeng, Z., Wong, K.H.: Music genre classification using a hierarchical long short term memory (LSTM) model. In: Third International Workshop on Pattern Recognition, vol. 10828, p. 108281B. International Society for Optics and Photonics (2018)
Urbano, J., Schedl, M., Serra, X.: Evaluation in music information retrieval. J. Intell. Inf. Syst. 41(3), 345–369 (2013)
Wang, Z., Muknahallipatna, S., Fan, M., Okray, A., Lan, C.: Music classification using an improved CRNN with multi-directional spatial dependencies in both time and frequency dimensions. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Xu, M., Maddage, N.C., Xu, C., Kankanhalli, M., Tian, Q.: Creating audio keywords for event detection in soccer video. In: 2003 International Conference on Multimedia and Expo. ICME2003. Proceedings (Cat. No. 03TH8698), vol. 2, pp. II-281. IEEE (2003)
Yi, Y., Chen, K.Y., Gu, H.Y.: Mixture of CNN experts from multiple acoustic feature domain for music genre classification. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1250–1255. IEEE (2019)
Zhang, C., Zhang, Y., Chen, C.: SongNet: Real-Time Music Classification. Stanford University Press, Palo Alto (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kostrzewa, D., Kaminski, P., Brzeski, R. (2021). Music Genre Classification: Looking for the Perfect Network. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-77961-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)