Abstract
There have been various generative music applications recently which employ a pre-trained transformer neural model. The way in which these models are trained greatly effects the nature of the music they produce, but there has been little exploration of the extent of this phenomenon. We provide here a systematic evaluation of the output from GPT-2-based transformer models by analysing, comparing and contrasting the output from multiple models trained under various conditions, with reference to numerous musical metrics. As a further element of our methodology, we describe a web application for exploring the output of such models. We conclude with a summary of our findings on how training effects output, a discussion around how our methodology could be used in generative music practice, and future avenues for research.
B. Banar—Research student at the UKRI Centre for Doctoral Training in Artificial Intelligence and Music, supported jointly by UK Research and Innovation [grant number EP/S022694/1] and Queen Mary University of London.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Banar, B., Colton, S.: Generating music with extreme passages using GPT-2. In: Proceedings of the EvoMusArt Conference, Late Breaking Abstracts (2021)
Bretan, M., Weinberg, G., Heck, L.: A unit selection methodology for music generation using deep neural networks. In: Proceedings of the International Conference on Computational Creativity (2016)
Chen, M., et al.: Generative pretraining from pixels. In: Proceedings of ICML (2020)
Dai, Z., et al.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of ACL (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of ICLR (2021)
Forte, A.: The Structure of Atonal Music. Yale University Press, New Haven (1977)
Geerlings, C., Meroño-Peñuela, A.: Interacting with GPT-2 to generate controlled and believable musical sequences. In: ISMIR workshop on NLP for Music & Audio (2020)
Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Proceedings of AAAI (2021)
Huang, C., et al..: Music transformer: generating music with long-term structure. In: Proceedings of ICLR (2018)
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the ACM International Conference on Multimedia (2020)
Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation. arXiv:2011.06801 (2020)
Kong, Q., Li, B., Chen, J., Wang, Y.: GiantMIDI-Piano: a large-scale MIDI dataset for classical piano music. arXiv:2010.07061 (2020)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
MuseNet. https://openai.com/blog/musenet/. Accessed 24 Nov 2021
Puri, R., Spring, R., Patwary, M., Shoeybi, M., Catanzaro, B.: Training question answering models from synthetic data. arXiv:2002.09599 (2020)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Simon, I., Morris, D., Basu, S.: MySong: automatic accompaniment generation for vocal melodies. In: Proceedings of CHI (2008)
Terrell, G.R., Scott, D.W.: Oversmoothed nonparametric density estimates. J. Am. Stat. Assoc. 80(389), 209–214 (1985)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of Advances in NIPS (2017)
Generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn. Accessed 24 Nov 2021
Wu, Z., Liu, N.F., Potts, C.: Identifying the limits of cross-domain knowledge transfer for pretrained models. arXiv:2104.08410 (2021)
Yang, J., et al.: Towards making the most of BERT in neural machine translation. In: Proceedings of AAAI (2020)
Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of ISMIR (2017)
Yang, L.-C., Lerch, A.: On the evaluation of generative models in music. Neural Comput. Appl. 32(9), 4773–4784 (2020). https://doi.org/10.1007/s00521-018-3849-7
Acknowledgements
We wish to thank the anonymous reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Banar, B., Colton, S. (2022). A Systematic Evaluation of GPT-2-Based Music Generation. In: Martins, T., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2022. Lecture Notes in Computer Science, vol 13221. Springer, Cham. https://doi.org/10.1007/978-3-031-03789-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-03789-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-03788-7
Online ISBN: 978-3-031-03789-4
eBook Packages: Computer ScienceComputer Science (R0)