Abstract
Long-structured music generation that can be compared to human compositions remains an unresolved area of research. Since their introduction, the Transformer model and its variations, which rely on self-attention, have gained popularity in generating long-structured music. However, these models employ the teacher-forcing approach during training, which causes an exposure bias problem. Consequently, the generative model is incapable of producing music that consistently adheres to music theory. To address this issue, we propose a new Linear Transformer-GAN structure that generates high-quality music using a discriminator that has been trained to detect exposure bias. The Linear Transformer, a new and efficient variation of transformers, is creatively integrated with a generative adversarial network (GAN) to form our proposed model. In order to overcome the limitations of discrete domain data in GAN, we use the Policy Gradient and present a new discriminator structure that evaluates the current sequence reward based on several dimensions of music information. We use both the cross-entropy loss of different information dimensions and a music-theoretic mechanism to train the discriminator. Our experiments demonstrate that the proposed model generates music more consistent with music theory and is perceived as more pleasurable by listeners. This conclusion is supported by objective metrics and human evaluation. Overall, our approach offers a promising solution to the exposure bias problem in long-structured music generation and provides a more effective means of generating music that adheres to established music theory principles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, pp. 2978–2988 (2019)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, vol. 119, pp. 5156–5165 (2020)
Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural networks. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp. 1171–1179 (2015)
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, pp. 50–57 (2018)
Oore, S., Simon, I., Dieleman, S., Eck, D., Simonyan, K.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32, 955–967 (2020). https://doi.org/10.1007/s00521-018-3758-9
Huang, Y., Yang, Y.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: MM 2020: The 28th ACM International Conference On Multimedia, pp. 1180–1188 (2020)
Hsiao, W., Liu, J., Yeh, Y., Yang, Y.: Compound word transformer: learning to compose full-song music over dynamic directed hypergraphs. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 178–186 (2021)
Dong, H., Hsiao, W., Yang, L., Yang, Y.: MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018), pp. 34–41 (2018)
Zhang, N.: Learning adversarial transformer for symbolic music generation. IEEE Trans. Neural Netw. Learn. Syst. 34(4), 1754–1763 (2023)
Pardo, B., Birmingham, W.: Algorithms for chordal analysis. Comput. Music. J. 26, 27–49 (2002)
Chou, Y., Chen, I., Chang, C., Ching, J., Yang, Y.: MidiBERT-Piano: large-scale pre-training for symbolic music understanding. CoRR. abs/2107.05223 (2021)
Raffel, C.: Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. Columbia University, USA (2016)
Ferreira, L., Lelis, L., Whitehead, J.: Computer-generated music for tabletop role-playing games. In: Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 59–65 (2020)
Wang, Z., et al.: POP909: a pop-song dataset for music arrangement generation. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, pp. 38–45 (2020)
Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2852–2858 (2017)
Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, pp. 2157–2169 (2017)
Bernardes, G., Cocharro, D., Caetano, M., Guedes, C., Davies, M.: A multi-level tonal interval space for modelling pitch relatedness and musical consonance. J. New Music Res. 45, 1–14 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, D., Chen, J., Gao, Z., Pan, G. (2023). Linear Transformer-GAN: A Novel Architecture to Symbolic Music Generation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-44195-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)