Abstract
In this work we study the use of convolutional neural networks for genre recognition in symbolically represented music. Specifically, we explore the effects of changing network depth, width and kernel sizes while keeping the number of trainable parameters and each block’s receptive field constant. We propose an architecture for handling MIDI data which makes use of multiple resolutions of the input, called MuSeReNet - Multiple Sequence Resolution Network. Through our experiments we significantly outperform the state-of-the-art for MIDI genre recognition on the topMAGD and MASD datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The code is available in the following github repository: https://github.com/kinezodin/cnn-midi-genre.
- 2.
- 3.
References
Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., Yang, Y.-H.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Mao, H.H., Shin, T., Cottrell, G.: DeepJ: style-specific music generation. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 377–382. IEEE (2018)
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: Midi-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: 19th International Society for Music Information Retrieval Conference (ISMIR 2018) (2018)
Oramas, S., Barbieri, F., Nieto, O., Serra, X.: Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. Retrieval 1(1), 4–21 (2018)
Yu, Y., Luo, S., Liu, S., Qiao, H., Liu, Y., Feng, L.: Deep attention based music genre classification. Neurocomputing 372, 84–91 (2020)
Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for sound classification. Appl. Soft Comput. 90, 106073 (2020)
Yang, R., Feng, L., Wang, H., Yao, J., Luo, S.: Parallel recurrent convolutional neural networks-based music genre classification method for mobile devices. IEEE Access 8, 19 629–19 637 (2020)
Ren, J.-M., Wu, M.-J., Jang, J.-S.R.: Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015)
Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 26–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_3
Padial, J., Goel, A.: Music mood classification (2018)
Oord, A.V.D., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
McKay, C., Fujinaga, I.: Musical genre classification: is it worth pursuing and how can it be improved? In: ISMIR, pp. 101–106 (2006)
Dannenberg, R.B., Thom, B., Watson, D.: A machine learning approach to musical style recognition (1997)
Karydis, I., Nanopoulos, A., Manolopoulos, Y.: Symbolic musical genre classification based on repeating patterns. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pp. 53–58 (2006)
Kotsifakos, A., Kotsifakos, E.E., Papapetrou, P., Athitsos, V.: Genre classification of symbolic music with SMBGT. In: Proceedings of the 6th international conference on PErvasive technologies related to assistive environments, pp. 1–7 (2013)
Zheng, E., Moh, M., Moh, T.-S.: Music genre classification: a n-gram based musicological approach. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 671–677. IEEE (2017)
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. dissertation, Columbia University (2016)
Ferraro, A., Lemstöm, K.: On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns. In: 5th International Conference on Digital Libraries for Musicology, Paris (2018)
Meredith, D., Lemström, K., Wiggins, G.A.: Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. J. New Music Res. 31(4), 321–345 (2002)
Ukkonen, E., Lemström, K., Mäkinen, V.: Sweepline the music. In: Klein, R., Six, H.-W., Wegner, L. (eds.) Computer Science in Perspective. LNCS, vol. 2598, pp. 330–342. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36477-3_25
Duggirala, S., Moh, T.-S.: A novel approach to music genre classification using natural language processing and spark. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–8. IEEE (2020)
Liang, H., Lei, W., Chan, P.Y., Yang, Z., Sun, M., Chua, T.-S.: PiRhDy: learning pitch-, rhythm-, and dynamics-aware embeddings for symbolic music. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 574–582 (2020)
Lanchantin, J. Singh, R., Lin, Z., Qi, Y.: Deep motif: visualizing genomic sequence classifications. arXiv preprint arXiv:1605.01133 (2016)
Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018)
Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)
Kawahara, J., Hamarneh, G.: Multi-resolution-Tract CNN with hybrid pretrained and skin-lesion trained layers. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 164–171. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47157-0_20
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset (2011)
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: ISMIR, pp. 469–474 (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y., (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Carsault, T., Nika, J., Esling, P.: Using musical relationships between chord labels in automatic chord extraction tasks. arXiv preprint arXiv:1911.04973 (2019)
Acknowledgment
This research is carried out/funded in the context of the project “Automatic Music Composition with Hybrid Models of Knowledge Representation, Automatic Reasoning and Deep Machine Learning” (5049188) under the call for proposals “Researchers’ support with an emphasis on young researchers- 2nd Cycle”. The project is co-financed by Greece and the European Union (European Social Fund- ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014–2020”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dervakos, E., Kotsani, N., Stamou, G. (2021). Genre Recognition from Symbolic Music with CNNs. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-72914-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72913-4
Online ISBN: 978-3-030-72914-1
eBook Packages: Computer ScienceComputer Science (R0)