Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation

Tian, Dingxiaofei; Chen, Jinyan; Gao, Zheyan; Pan, Gang

doi:10.1007/978-3-031-44195-0_37

Dingxiaofei Tian¹¹,
Jinyan Chen¹¹,
Zheyan Gao¹¹ &
…
Gang Pan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

International Conference on Artificial Neural Networks

1065 Accesses

Abstract

Long-structured music generation that can be compared to human compositions remains an unresolved area of research. Since their introduction, the Transformer model and its variations, which rely on self-attention, have gained popularity in generating long-structured music. However, these models employ the teacher-forcing approach during training, which causes an exposure bias problem. Consequently, the generative model is incapable of producing music that consistently adheres to music theory. To address this issue, we propose a new Linear Transformer-GAN structure that generates high-quality music using a discriminator that has been trained to detect exposure bias. The Linear Transformer, a new and efficient variation of transformers, is creatively integrated with a generative adversarial network (GAN) to form our proposed model. In order to overcome the limitations of discrete domain data in GAN, we use the Policy Gradient and present a new discriminator structure that evaluates the current sequence reward based on several dimensions of music information. We use both the cross-entropy loss of different information dimensions and a music-theoretic mechanism to train the discriminator. Our experiments demonstrate that the proposed model generates music more consistent with music theory and is perceived as more pleasurable by listeners. This conclusion is supported by objective metrics and human evaluation. Overall, our approach offers a promising solution to the exposure bias problem in long-structured music generation and provides a more effective means of generating music that adheres to established music theory principles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, pp. 2978–2988 (2019)
Google Scholar
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, vol. 119, pp. 5156–5165 (2020)
Google Scholar
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp. 1171–1179 (2015)
Google Scholar
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, pp. 50–57 (2018)
Google Scholar
Oore, S., Simon, I., Dieleman, S., Eck, D., Simonyan, K.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32, 955–967 (2020). https://doi.org/10.1007/s00521-018-3758-9
Article Google Scholar
Huang, Y., Yang, Y.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: MM 2020: The 28th ACM International Conference On Multimedia, pp. 1180–1188 (2020)
Google Scholar
Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 178–186 (2021)
Google Scholar
Dong, H., Hsiao, W., Yang, L., Yang, Y.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018), pp. 34–41 (2018)
Google Scholar
Zhang, N.: Learning adversarial transformer for symbolic music generation. IEEE Trans. Neural Netw. Learn. Syst. 34(4), 1754–1763 (2023)
Article MathSciNet Google Scholar
Pardo, B., Birmingham, W.: Algorithms for chordal analysis. Comput. Music. J. 26, 27–49 (2002)
Article Google Scholar
Chou, Y., Chen, I., Chang, C., Ching, J., Yang, Y.: MidiBERT-Piano: large-scale pre-training for symbolic music understanding. CoRR. abs/2107.05223 (2021)
Google Scholar
Raffel, C.: Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. Columbia University, USA (2016)
Google Scholar
Ferreira, L., Lelis, L., Whitehead, J.: Computer-generated music for tabletop role-playing games. In: Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 59–65 (2020)
Google Scholar
Wang, Z., et al.: POP909: a pop-song dataset for music arrangement generation. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, pp. 38–45 (2020)
Google Scholar
Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2852–2858 (2017)
Google Scholar
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2157–2169 (2017)
Google Scholar
Bernardes, G., Cocharro, D., Caetano, M., Guedes, C., Davies, M.: A multi-level tonal interval space for modelling pitch relatedness and musical consonance. J. New Music Res. 45, 1–14 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Dingxiaofei Tian, Jinyan Chen, Zheyan Gao & Gang Pan

Authors

Dingxiaofei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zheyan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dingxiaofei Tian .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Lazaros Iliadis
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
Lancaster University, Lancaster, UK
Plamen Angelov
Teesside University, Middlesbrough, UK
Chrisina Jayne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, D., Chen, J., Gao, Z., Pan, G. (2023). Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-44195-0_37
Published: 22 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation