Skip to main content

Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

  • 1065 Accesses

Abstract

Long-structured music generation that can be compared to human compositions remains an unresolved area of research. Since their introduction, the Transformer model and its variations, which rely on self-attention, have gained popularity in generating long-structured music. However, these models employ the teacher-forcing approach during training, which causes an exposure bias problem. Consequently, the generative model is incapable of producing music that consistently adheres to music theory. To address this issue, we propose a new Linear Transformer-GAN structure that generates high-quality music using a discriminator that has been trained to detect exposure bias. The Linear Transformer, a new and efficient variation of transformers, is creatively integrated with a generative adversarial network (GAN) to form our proposed model. In order to overcome the limitations of discrete domain data in GAN, we use the Policy Gradient and present a new discriminator structure that evaluates the current sequence reward based on several dimensions of music information. We use both the cross-entropy loss of different information dimensions and a music-theoretic mechanism to train the discriminator. Our experiments demonstrate that the proposed model generates music more consistent with music theory and is perceived as more pleasurable by listeners. This conclusion is supported by objective metrics and human evaluation. Overall, our approach offers a promising solution to the exposure bias problem in long-structured music generation and provides a more effective means of generating music that adheres to established music theory principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, pp. 2978–2988 (2019)

    Google Scholar 

  2. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, vol. 119, pp. 5156–5165 (2020)

    Google Scholar 

  3. Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp. 1171–1179 (2015)

    Google Scholar 

  4. Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, pp. 50–57 (2018)

    Google Scholar 

  5. Oore, S., Simon, I., Dieleman, S., Eck, D., Simonyan, K.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32, 955–967 (2020). https://doi.org/10.1007/s00521-018-3758-9

    Article  Google Scholar 

  6. Huang, Y., Yang, Y.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: MM 2020: The 28th ACM International Conference On Multimedia, pp. 1180–1188 (2020)

    Google Scholar 

  7. Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 178–186 (2021)

    Google Scholar 

  8. Dong, H., Hsiao, W., Yang, L., Yang, Y.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018), pp. 34–41 (2018)

    Google Scholar 

  9. Zhang, N.: Learning adversarial transformer for symbolic music generation. IEEE Trans. Neural Netw. Learn. Syst. 34(4), 1754–1763 (2023)

    Article  MathSciNet  Google Scholar 

  10. Pardo, B., Birmingham, W.: Algorithms for chordal analysis. Comput. Music. J. 26, 27–49 (2002)

    Article  Google Scholar 

  11. Chou, Y., Chen, I., Chang, C., Ching, J., Yang, Y.: MidiBERT-Piano: large-scale pre-training for symbolic music understanding. CoRR. abs/2107.05223 (2021)

    Google Scholar 

  12. Raffel, C.: Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. Columbia University, USA (2016)

    Google Scholar 

  13. Ferreira, L., Lelis, L., Whitehead, J.: Computer-generated music for tabletop role-playing games. In: Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 59–65 (2020)

    Google Scholar 

  14. Wang, Z., et al.: POP909: a pop-song dataset for music arrangement generation. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, pp. 38–45 (2020)

    Google Scholar 

  15. Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2852–2858 (2017)

    Google Scholar 

  16. Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2157–2169 (2017)

    Google Scholar 

  17. Bernardes, G., Cocharro, D., Caetano, M., Guedes, C., Davies, M.: A multi-level tonal interval space for modelling pitch relatedness and musical consonance. J. New Music Res. 45, 1–14 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dingxiaofei Tian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tian, D., Chen, J., Gao, Z., Pan, G. (2023). Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44195-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44194-3

  • Online ISBN: 978-3-031-44195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics