Genre Recognition from Symbolic Music with CNNs

Dervakos, Edmund; Kotsani, Natalia; Stamou, Giorgos

doi:10.1007/978-3-030-72914-1_7

Edmund Dervakos¹¹,
Natalia Kotsani¹¹ &
Giorgos Stamou¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12693))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

1841 Accesses
3 Citations

Abstract

In this work we study the use of convolutional neural networks for genre recognition in symbolically represented music. Specifically, we explore the effects of changing network depth, width and kernel sizes while keeping the number of trainable parameters and each block’s receptive field constant. We propose an architecture for handling MIDI data which makes use of multiple resolutions of the input, called MuSeReNet - Multiple Sequence Resolution Network. Through our experiments we significantly outperform the state-of-the-art for MIDI genre recognition on the topMAGD and MASD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code is available in the following github repository: https://github.com/kinezodin/cnn-midi-genre.
2.
https://www.reddit.com/r/WeAreTheMusicMakers/comments/3ajwe4/the_largest_midi_collection_on_the_internet/.
3.
http://static.echonest.com/enspex/.

References

Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., Yang, Y.-H.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Mao, H.H., Shin, T., Cottrell, G.: DeepJ: style-specific music generation. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 377–382. IEEE (2018)
Google Scholar
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: Midi-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: 19th International Society for Music Information Retrieval Conference (ISMIR 2018) (2018)
Google Scholar
Oramas, S., Barbieri, F., Nieto, O., Serra, X.: Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. Retrieval 1(1), 4–21 (2018)
Article Google Scholar
Yu, Y., Luo, S., Liu, S., Qiao, H., Liu, Y., Feng, L.: Deep attention based music genre classification. Neurocomputing 372, 84–91 (2020)
Article Google Scholar
Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for sound classification. Appl. Soft Comput. 90, 106073 (2020)
Article Google Scholar
Yang, R., Feng, L., Wang, H., Yao, J., Luo, S.: Parallel recurrent convolutional neural networks-based music genre classification method for mobile devices. IEEE Access 8, 19 629–19 637 (2020)
Article Google Scholar
Ren, J.-M., Wu, M.-J., Jang, J.-S.R.: Automatic music mood classification based on timbre and modulation features. IEEE Trans. Affect. Comput. 6(3), 236–246 (2015)
Article Google Scholar
Xue, H., Xue, L., Su, F.: Multimodal music mood classification by fusion of audio and lyrics. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 26–37. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14442-9_3
Chapter Google Scholar
Padial, J., Goel, A.: Music mood classification (2018)
Google Scholar
Oord, A.V.D., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
McKay, C., Fujinaga, I.: Musical genre classification: is it worth pursuing and how can it be improved? In: ISMIR, pp. 101–106 (2006)
Google Scholar
Dannenberg, R.B., Thom, B., Watson, D.: A machine learning approach to musical style recognition (1997)
Google Scholar
Karydis, I., Nanopoulos, A., Manolopoulos, Y.: Symbolic musical genre classification based on repeating patterns. In: Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pp. 53–58 (2006)
Google Scholar
Kotsifakos, A., Kotsifakos, E.E., Papapetrou, P., Athitsos, V.: Genre classification of symbolic music with SMBGT. In: Proceedings of the 6th international conference on PErvasive technologies related to assistive environments, pp. 1–7 (2013)
Google Scholar
Zheng, E., Moh, M., Moh, T.-S.: Music genre classification: a n-gram based musicological approach. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 671–677. IEEE (2017)
Google Scholar
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. dissertation, Columbia University (2016)
Google Scholar
Ferraro, A., Lemstöm, K.: On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns. In: 5th International Conference on Digital Libraries for Musicology, Paris (2018)
Google Scholar
Meredith, D., Lemström, K., Wiggins, G.A.: Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. J. New Music Res. 31(4), 321–345 (2002)
Article Google Scholar
Ukkonen, E., Lemström, K., Mäkinen, V.: Sweepline the music. In: Klein, R., Six, H.-W., Wegner, L. (eds.) Computer Science in Perspective. LNCS, vol. 2598, pp. 330–342. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36477-3_25
Chapter Google Scholar
Duggirala, S., Moh, T.-S.: A novel approach to music genre classification using natural language processing and spark. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp. 1–8. IEEE (2020)
Google Scholar
Liang, H., Lei, W., Chan, P.Y., Yang, Z., Sun, M., Chua, T.-S.: PiRhDy: learning pitch-, rhythm-, and dynamics-aware embeddings for symbolic music. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 574–582 (2020)
Google Scholar
Lanchantin, J. Singh, R., Lin, Z., Qi, Y.: Deep motif: visualizing genomic sequence classifications. arXiv preprint arXiv:1605.01133 (2016)
Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018)
Google Scholar
Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)
Google Scholar
Kawahara, J., Hamarneh, G.: Multi-resolution-Tract CNN with hybrid pretrained and skin-lesion trained layers. In: Wang, L., Adeli, E., Wang, Q., Shi, Y., Suk, H.-I. (eds.) MLMI 2016. LNCS, vol. 10019, pp. 164–171. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47157-0_20
Chapter Google Scholar
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset (2011)
Google Scholar
Schindler, A., Mayer, R., Rauber, A.: Facilitating comprehensive benchmarking experiments on the million song dataset. In: ISMIR, pp. 469–474 (2012)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y., (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Carsault, T., Nika, J., Esling, P.: Using musical relationships between chord labels in automatic chord extraction tasks. arXiv preprint arXiv:1911.04973 (2019)

Download references

Acknowledgment

This research is carried out/funded in the context of the project “Automatic Music Composition with Hybrid Models of Knowledge Representation, Automatic Reasoning and Deep Machine Learning” (5049188) under the call for proposals “Researchers’ support with an emphasis on young researchers- 2nd Cycle”. The project is co-financed by Greece and the European Union (European Social Fund- ESF) by the Operational Programme Human Resources Development, Education and Lifelong Learning 2014–2020”.

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Edmund Dervakos, Natalia Kotsani & Giorgos Stamou

Authors

Edmund Dervakos
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Kotsani
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Stamou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Kotsani .

Editor information

Editors and Affiliations

University of A Coruña, A Coruña, Spain
Juan Romero
University of Coimbra, Coimbra, Portugal
Tiago Martins
University of A Coruña, A Coruña, Spain
Nereida Rodríguez-Fernández

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dervakos, E., Kotsani, N., Stamou, G. (2021). Genre Recognition from Symbolic Music with CNNs. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021. Lecture Notes in Computer Science(), vol 12693. Springer, Cham. https://doi.org/10.1007/978-3-030-72914-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-72914-1_7
Published: 02 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72913-4
Online ISBN: 978-3-030-72914-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics