A Systematic Evaluation of GPT-2-Based Music Generation

Banar, Berker; Colton, Simon

doi:10.1007/978-3-031-03789-4_2

A Systematic Evaluation of GPT-2-Based Music Generation

Berker Banar¹¹ &
Simon Colton¹¹

Conference paper
First Online: 15 April 2022

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13221))

Abstract

There have been various generative music applications recently which employ a pre-trained transformer neural model. The way in which these models are trained greatly effects the nature of the music they produce, but there has been little exploration of the extent of this phenomenon. We provide here a systematic evaluation of the output from GPT-2-based transformer models by analysing, comparing and contrasting the output from multiple models trained under various conditions, with reference to numerous musical metrics. As a further element of our methodology, we describe a web application for exploring the output of such models. We conclude with a summary of our findings on how training effects output, a discussion around how our methodology could be used in generative music practice, and future avenues for research.

B. Banar—Research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music, supported jointly by UK Research and Innovation [grant number EP/S022694/1] and Queen Mary University of London.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Banar, B., Colton, S.: Generating music with extreme passages using GPT-2. In: Proceedings of the EvoMusArt Conference, Late Breaking Abstracts (2021)
Google Scholar
Bretan, M., Weinberg, G., Heck, L.: A unit selection methodology for music generation using deep neural networks. In: Proceedings of the International Conference on Computational Creativity (2016)
Google Scholar
Chen, M., et al.: Generative pretraining from pixels. In: Proceedings of ICML (2020)
Google Scholar
Dai, Z., et al.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of ACL (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of ICLR (2021)
Google Scholar
Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1977)
Google Scholar
Geerlings, C., Meroño-Peñuela, A.: Interacting with GPT-2 to generate controlled and believable musical sequences. In: ISMIR workshop on NLP for Music & Audio (2020)
Google Scholar
Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Proceedings of AAAI (2021)
Google Scholar
Huang, C., et al..: Music transformer: generating music with long-term structure. In: Proceedings of ICLR (2018)
Google Scholar
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the ACM International Conference on Multimedia (2020)
Google Scholar
Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation. arXiv:2011.06801 (2020)
Kong, Q., Li, B., Chen, J., Wang, Y.: GiantMIDI-Piano: a large-scale MIDI dataset for classical piano music. arXiv:2010.07061 (2020)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
MuseNet. https://openai.com/blog/musenet/. Accessed 24 Nov 2021
Puri, R., Spring, R., Patwary, M., Shoeybi, M., Catanzaro, B.: Training question answering models from synthetic data. arXiv:2002.09599 (2020)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
MATH Google Scholar
Simon, I., Morris, D., Basu, S.: MySong: automatic accompaniment generation for vocal melodies. In: Proceedings of CHI (2008)
Google Scholar
Terrell, G.R., Scott, D.W.: Oversmoothed nonparametric density estimates. J. Am. Stat. Assoc. 80(389), 209–214 (1985)
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in NIPS (2017)
Google Scholar
Generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn. Accessed 24 Nov 2021
Wu, Z., Liu, N.F., Potts, C.: Identifying the limits of cross-domain knowledge transfer for pretrained models. arXiv:2104.08410 (2021)
Yang, J., et al.: Towards making the most of BERT in neural machine translation. In: Proceedings of AAAI (2020)
Google Scholar
Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of ISMIR (2017)
Google Scholar
Yang, L.-C., Lerch, A.: On the evaluation of generative models in music. Neural Comput. Appl. 32(9), 4773–4784 (2020). https://doi.org/10.1007/s00521-018-3849-7
Article Google Scholar

Download references

Acknowledgements

We wish to thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

School of EECS, Queen Mary University of London, London, E1 4NS, UK
Berker Banar & Simon Colton

Authors

Berker Banar
View author publications
You can also search for this author in PubMed Google Scholar
Simon Colton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Berker Banar .

Editor information

Editors and Affiliations

University of Coimbra, Coimbra, Portugal
Tiago Martins
University of A Coruña, A Coruña, Spain
Nereida Rodríguez-Fernández
University of Coimbra, Coimbra, Portugal
Sérgio M. Rebelo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Banar, B., Colton, S. (2022). A Systematic Evaluation of GPT-2-Based Music Generation. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2022. Lecture Notes in Computer Science, vol 13221. Springer, Cham. https://doi.org/10.1007/978-3-031-03789-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-03789-4_2
Published: 15 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-03788-7
Online ISBN: 978-3-031-03789-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics