Skip to main content

Genre Recognition from Symbolic Music with CNNs

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2021)

Abstract

In this work we study the use of convolutional neural networks for genre recognition in symbolically represented music. Specifically, we explore the effects of changing network depth, width and kernel sizes while keeping the number of trainable parameters and each block’s receptive field constant. We propose an architecture for handling MIDI data which makes use of multiple resolutions of the input, called MuSeReNet - Multiple Sequence Resolution Network. Through our experiments we significantly outperform the state-of-the-art for MIDI genre recognition on the topMAGD and MASD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code is available in the following github repository: https://github.com/kinezodin/cnn-midi-genre.

  2. 2.

    https://www.reddit.com/r/WeAreTheMusicMakers/comments/3ajwe4/the_largest_midi_collection_on_the_internet/.

  3. 3.

    http://static.echonest.com/enspex/.

References

  1. Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., Yang, Y.-H.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  2. Mao, H.H., Shin, T., Cottrell, G.: DeepJ: style-specific music generation. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 377–382. IEEE (2018)

    Google Scholar 

  3. Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: Midi-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: 19th International Society for Music Information Retrieval Conference (ISMIR 2018) (2018)

    Google Scholar 

  4. Oramas, S., Barbieri, F., Nieto, O., Serra, X.: Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. Retrieval 1(1), 4–21 (2018)

    Article  Google Scholar 

  5. Yu, Y., Luo, S., Liu, S., Qiao, H., Liu, Y., Feng, L.: Deep attention based music genre classification. Neurocomputing 372, 84–91 (2020)

    Article  Google Scholar 

  6. Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for sound classification. Appl. Soft Comput. 90, 106073 (2020)

    Article  Google Scholar 

  7. Yang, R., Feng, L., Wang, H., Yao, J., Luo, S.: Parallel recurrent convolutional neural networks-based music genre classification method for mobile devices. IEEE Access 8, 19 629–19 637 (2020)

    Article  Google Scholar 

  8. Ren, J.-M., Wu, M.-J., Jang, J.-S.R.: Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015)

    Article  Google Scholar 

  9. Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 26–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_3

    Chapter  Google Scholar 

  10. Padial, J., Goel, A.: Music mood classification (2018)

    Google Scholar 

  11. Oord, A.V.D., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

  12. McKay, C., Fujinaga, I.: Musical genre classification: is it worth pursuing and how can it be improved? In: ISMIR, pp. 101–106 (2006)

    Google Scholar 

  13. Dannenberg, R.B., Thom, B., Watson, D.: A machine learning approach to musical style recognition (1997)

    Google Scholar 

  14. Karydis, I., Nanopoulos, A., Manolopoulos, Y.: Symbolic musical genre classification based on repeating patterns. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pp. 53–58 (2006)

    Google Scholar 

  15. Kotsifakos, A., Kotsifakos, E.E., Papapetrou, P., Athitsos, V.: Genre classification of symbolic music with SMBGT. In: Proceedings of the 6th international conference on PErvasive technologies related to assistive environments, pp. 1–7 (2013)

    Google Scholar 

  16. Zheng, E., Moh, M., Moh, T.-S.: Music genre classification: a n-gram based musicological approach. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 671–677. IEEE (2017)

    Google Scholar 

  17. Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. dissertation, Columbia University (2016)

    Google Scholar 

  18. Ferraro, A., Lemstöm, K.: On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns. In: 5th International Conference on Digital Libraries for Musicology, Paris (2018)

    Google Scholar 

  19. Meredith, D., Lemström, K., Wiggins, G.A.: Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. J. New Music Res. 31(4), 321–345 (2002)

    Article  Google Scholar 

  20. Ukkonen, E., Lemström, K., Mäkinen, V.: Sweepline the music. In: Klein, R., Six, H.-W., Wegner, L. (eds.) Computer Science in Perspective. LNCS, vol. 2598, pp. 330–342. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36477-3_25

    Chapter  Google Scholar 

  21. Duggirala, S., Moh, T.-S.: A novel approach to music genre classification using natural language processing and spark. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–8. IEEE (2020)

    Google Scholar 

  22. Liang, H., Lei, W., Chan, P.Y., Yang, Z., Sun, M., Chua, T.-S.: PiRhDy: learning pitch-, rhythm-, and dynamics-aware embeddings for symbolic music. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 574–582 (2020)

    Google Scholar 

  23. Lanchantin, J. Singh, R., Lin, Z., Qi, Y.: Deep motif: visualizing genomic sequence classifications. arXiv preprint arXiv:1605.01133 (2016)

  24. Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018)

    Google Scholar 

  25. Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)

    Google Scholar 

  26. Kawahara, J., Hamarneh, G.: Multi-resolution-Tract CNN with hybrid pretrained and skin-lesion trained layers. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 164–171. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47157-0_20

    Chapter  Google Scholar 

  27. Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset (2011)

    Google Scholar 

  28. Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: ISMIR, pp. 469–474 (2012)

    Google Scholar 

  29. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y., (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980

  30. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)

  31. Carsault, T., Nika, J., Esling, P.: Using musical relationships between chord labels in automatic chord extraction tasks. arXiv preprint arXiv:1911.04973 (2019)

Download references

Acknowledgment

This research is carried out/funded in the context of the project “Automatic Music Composition with Hybrid Models of Knowledge Representation, Automatic Reasoning and Deep Machine Learning” (5049188) under the call for proposals “Researchers’ support with an emphasis on young researchers- 2nd Cycle”. The project is co-financed by Greece and the European Union (European Social Fund- ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014–2020”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Kotsani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dervakos, E., Kotsani, N., Stamou, G. (2021). Genre Recognition from Symbolic Music with CNNs. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72914-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72913-4

  • Online ISBN: 978-3-030-72914-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics