Abstract
With the development of artificial intelligence and deep learning, a large number of music generation methods have been proposed. Recently, Transformer has been widely used in music generation. However, the structural complexity of music puts forward higher requirements for music generation. In this paper, we propose a new automatic music generation network which consists of a Recursive Skip Connection with Layer Normalization (RSCLN) model, a Transformer-XL model and a multi-head attention mechanism. Our method not only alleviates the gradient vanishing problem in the model training, but also increases the ability of the model to capture the correlation of music information before and after, so as to generate music works closer to the original music style. Effectiveness of the RSCLN_Transformer-XL music automatic generation method is verified through music similarity evaluation experiments using music structure similarity and listening test. The experimental results show that the RSCLN_Transformer-XL music automatic generation model can generate better music than the Transformer-XL model.
Similar content being viewed by others
Data Availability
The dataset used during the current study can be obtained from reference [26].
References
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Hemalatha, E.: Artificial music generation using LSTM networks. Int. J. Eng. Adv. Technol. 9(2), 4315–4319 (2019)
Kemal, E.: An expert system for harmonizing chorales in the style of J. S. Bach. J. Logic. Program. 8(1), 145–185 (1990)
Salas, H., Gelbukh, A., Calvo, H.: Automatic music composition with simple probabilistic generative grammars. Polibits. 44(9), 59–65 (2011)
Feng, Y., Zhou, C.L.: Advances in algorithmic composition. J. Software. 10(2), 209–215 (2006)
Cao, X.Z., Zhang, A.L., Xu, J.C.: Intelligent music composition technology research based on genetic algorithm. Comput. Eng. Appl. 44(32), 206–209 (2008)
Todd, P.M.: A connectionist approach to algorithmic composition. Comput. Music. J. 13(4), 27–43 (1989)
Hochreiter, S., Schmidhube, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Eck, D., Schmidhuber, J.: A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale. 103(4), 48 (2002)
Li, S., Sung, Y.: INCO-GAN: variable-length music generation method based on inception model-based conditional GAN. Mathematics. 9(4), 102–110 (2021)
Dong, H.W., Hsiao, W.Y., Yang, L.C., Yang, Y.H.: Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the 31th Association for the Advance of Artificial Intelligence Conference, pp. 212–225 (2018)
Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, pp. 324–331 (2017)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30(13), 123–130 (2017)
Deng, X., Chen, S.J., Chen, Y.F., Xu, J.: Multi-level convolutional transformer with adaptive ranking for semi-supervised crowd counting. In: Proceedings of the 4th International Conference on Algorithms, Computing and Artificial Intelligence, pp. 28–34 (2021)
Huang, C., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A., Hoffman, M., Dinculescu, M., Eck, D.: Music transformer: generating music with long-term structure. In: International Conference on Learning Representations, pp. 364–375 (2019)
Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)
Choi, K., Hawthorne, C., Simon, I., Dinculescu, M., Engel, J.: Encoding musical style with transformer autoencoders. In: International Conference on Machine Learning, pp. 254–267 (2020)
Wu, S.L., Yang, Y.H.: The Jazz Transformer on the front line: exploring the shortcomings of AI-composed music through quantitative measures. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, pp. 451–463 (2020)
Donahue, C., Mao, H.H., Li ,Y.E., Cottrell, G. W., Mcauley, J.: LakhNES: improving multi-instrumental music generation with cross-domain pre-training. In: Music Information Retrieval Conference, pp. 685–692 (2019)
Oore, S., Simon, I., Dieleman, S., et al.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32(4), 955–967 (2020)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)
Liu, F., Ren, X., Zhang, Z., Sun, X., Zou, Y.: Rethinking Skip Connection with Layer Normalization in Transformers and ResNets. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1324–1332 (2020)
Zhang, B., Sennrich, R.: Root mean square layer normalization. NeurIPS. 13(27), 12360–12371 (2019)
Xiong, R., Yang, Y., He, D., Zheng, K., Liu, T.Y.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning, pp. 10524–10533 (2020)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 464–468 (2018)
Huang, Y.S., Yang, Y.H.: Pop music Transformer: Beat-based modeling and generation of expressive pop piano compositions. In: ACM Multimedia, pp. 1180–1188 (2020)
Ma, N., Zhang, X., Liu, M., et al.: Activate or not: learning customized activation. Comput. Vision Pattern Recogn. 21(5), 145–157 (2020)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. 46(7), 122–127 (2014)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(4), 623–656 (1948)
Levitin, D.J.: This is your brain on music: the science of a human obsession. Plume/Penguin, New York (2006)
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China (No. 62377034, 11872036), the Shaanxi Key Science and Technology Innovation Team Project (No. 2022TD-26), the Fundamental Research Fund for the Central Universities (No. GK202101004, GK202205035), the Science and Technology Plan of Xi’an city (No. 22GXFW0020), Shaanxi Science and Technology Plan Project (No. 2023YBGY158), and the Key Laboratory of the Ministry of Culture and Tourism (No. 2023-02).
Author information
Authors and Affiliations
Contributions
YZ contributed to conceptualization, resources, validation, supervision, writing—review and editing. XL contributed to methodology, software, visualization, writing—original draft. QL performed methodology and writing—review and editing. XW and HY performed supervision, validation, writing—review and editing. YS was involved in writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Communicated by J. Gao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Lv, X., Li, Q. et al. An automatic music generation method based on RSCLN_Transformer network. Multimedia Systems 30, 4 (2024). https://doi.org/10.1007/s00530-023-01245-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-023-01245-0