Skip to main content

A Systematic Evaluation of GPT-2-Based Music Generation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13221))

Abstract

There have been various generative music applications recently which employ a pre-trained transformer neural model. The way in which these models are trained greatly effects the nature of the music they produce, but there has been little exploration of the extent of this phenomenon. We provide here a systematic evaluation of the output from GPT-2-based transformer models by analysing, comparing and contrasting the output from multiple models trained under various conditions, with reference to numerous musical metrics. As a further element of our methodology, we describe a web application for exploring the output of such models. We conclude with a summary of our findings on how training effects output, a discussion around how our methodology could be used in generative music practice, and future avenues for research.

B. Banar—Research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music, supported jointly by UK Research and Innovation [grant number EP/S022694/1] and Queen Mary University of London.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Banar, B., Colton, S.: Generating music with extreme passages using GPT-2. In: Proceedings of the EvoMusArt Conference, Late Breaking Abstracts (2021)

    Google Scholar 

  2. Bretan, M., Weinberg, G., Heck, L.: A unit selection methodology for music generation using deep neural networks. In: Proceedings of the International Conference on Computational Creativity (2016)

    Google Scholar 

  3. Chen, M., et al.: Generative pretraining from pixels. In: Proceedings of ICML (2020)

    Google Scholar 

  4. Dai, Z., et al.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of ACL (2019)

    Google Scholar 

  5. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of ICLR (2021)

    Google Scholar 

  6. Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1977)

    Google Scholar 

  7. Geerlings, C., Meroño-Peñuela, A.: Interacting with GPT-2 to generate controlled and believable musical sequences. In: ISMIR workshop on NLP for Music & Audio (2020)

    Google Scholar 

  8. Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Proceedings of AAAI (2021)

    Google Scholar 

  9. Huang, C., et al..: Music transformer: generating music with long-term structure. In: Proceedings of ICLR (2018)

    Google Scholar 

  10. Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the ACM International Conference on Multimedia (2020)

    Google Scholar 

  11. Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation. arXiv:2011.06801 (2020)

  12. Kong, Q., Li, B., Chen, J., Wang, Y.: GiantMIDI-Piano: a large-scale MIDI dataset for classical piano music. arXiv:2010.07061 (2020)

  13. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)

    Article  MathSciNet  Google Scholar 

  14. MuseNet. https://openai.com/blog/musenet/. Accessed 24 Nov 2021

  15. Puri, R., Spring, R., Patwary, M., Shoeybi, M., Catanzaro, B.: Training question answering models from synthetic data. arXiv:2002.09599 (2020)

  16. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  17. Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  18. Simon, I., Morris, D., Basu, S.: MySong: automatic accompaniment generation for vocal melodies. In: Proceedings of CHI (2008)

    Google Scholar 

  19. Terrell, G.R., Scott, D.W.: Oversmoothed nonparametric density estimates. J. Am. Stat. Assoc. 80(389), 209–214 (1985)

    Article  MathSciNet  Google Scholar 

  20. Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in NIPS (2017)

    Google Scholar 

  21. Generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn. Accessed 24 Nov 2021

  22. Wu, Z., Liu, N.F., Potts, C.: Identifying the limits of cross-domain knowledge transfer for pretrained models. arXiv:2104.08410 (2021)

  23. Yang, J., et al.: Towards making the most of BERT in neural machine translation. In: Proceedings of AAAI (2020)

    Google Scholar 

  24. Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of ISMIR (2017)

    Google Scholar 

  25. Yang, L.-C., Lerch, A.: On the evaluation of generative models in music. Neural Comput. Appl. 32(9), 4773–4784 (2020). https://doi.org/10.1007/s00521-018-3849-7

    Article  Google Scholar 

Download references

Acknowledgements

We wish to thank the anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Berker Banar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banar, B., Colton, S. (2022). A Systematic Evaluation of GPT-2-Based Music Generation. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2022. Lecture Notes in Computer Science, vol 13221. Springer, Cham. https://doi.org/10.1007/978-3-031-03789-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-03789-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-03788-7

  • Online ISBN: 978-3-031-03789-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics