Skip to main content
Log in

An automatic music generation method based on RSCLN_Transformer network

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

With the development of artificial intelligence and deep learning, a large number of music generation methods have been proposed. Recently, Transformer has been widely used in music generation. However, the structural complexity of music puts forward higher requirements for music generation. In this paper, we propose a new automatic music generation network which consists of a Recursive Skip Connection with Layer Normalization (RSCLN) model, a Transformer-XL model and a multi-head attention mechanism. Our method not only alleviates the gradient vanishing problem in the model training, but also increases the ability of the model to capture the correlation of music information before and after, so as to generate music works closer to the original music style. Effectiveness of the RSCLN_Transformer-XL music automatic generation method is verified through music similarity evaluation experiments using music structure similarity and listening test. The experimental results show that the RSCLN_Transformer-XL music automatic generation model can generate better music than the Transformer-XL model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Algorithm 5–1
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The dataset used during the current study can be obtained from reference [26].

References

  1. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Hemalatha, E.: Artificial music generation using LSTM networks. Int. J. Eng. Adv. Technol. 9(2), 4315–4319 (2019)

    Article  Google Scholar 

  3. Kemal, E.: An expert system for harmonizing chorales in the style of J. S. Bach. J. Logic. Program. 8(1), 145–185 (1990)

    MathSciNet  Google Scholar 

  4. Salas, H., Gelbukh, A., Calvo, H.: Automatic music composition with simple probabilistic generative grammars. Polibits. 44(9), 59–65 (2011)

    Article  Google Scholar 

  5. Feng, Y., Zhou, C.L.: Advances in algorithmic composition. J. Software. 10(2), 209–215 (2006)

    Article  Google Scholar 

  6. Cao, X.Z., Zhang, A.L., Xu, J.C.: Intelligent music composition technology research based on genetic algorithm. Comput. Eng. Appl. 44(32), 206–209 (2008)

    Google Scholar 

  7. Todd, P.M.: A connectionist approach to algorithmic composition. Comput. Music. J. 13(4), 27–43 (1989)

    Article  Google Scholar 

  8. Hochreiter, S., Schmidhube, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  CAS  PubMed  Google Scholar 

  9. Eck, D., Schmidhuber, J.: A first look at music composition using lstm recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale. 103(4), 48 (2002)

    Google Scholar 

  10. Li, S., Sung, Y.: INCO-GAN: variable-length music generation method based on inception model-based conditional GAN. Mathematics. 9(4), 102–110 (2021)

    Article  Google Scholar 

  11. Dong, H.W., Hsiao, W.Y., Yang, L.C., Yang, Y.H.: Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of the 31th Association for the Advance of Artificial Intelligence Conference, pp. 212–225 (2018)

  12. Yang, L.C., Chou, S.Y., Yang, Y.H.: MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, pp. 324–331 (2017)

  13. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30(13), 123–130 (2017)

    Google Scholar 

  14. Deng, X., Chen, S.J., Chen, Y.F., Xu, J.: Multi-level convolutional transformer with adaptive ranking for semi-supervised crowd counting. In: Proceedings of the 4th International Conference on Algorithms, Computing and Artificial Intelligence, pp. 28–34 (2021)

  15. Huang, C., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A., Hoffman, M., Dinculescu, M., Eck, D.: Music transformer: generating music with long-term structure. In: International Conference on Learning Representations, pp. 364–375 (2019)

  16. Huang, Y.S., Yang, Y.H.: Pop music transformer: beat-based modeling and generation of expressive pop piano compositions. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1180–1188 (2020)

  17. Choi, K., Hawthorne, C., Simon, I., Dinculescu, M., Engel, J.: Encoding musical style with transformer autoencoders. In: International Conference on Machine Learning, pp. 254–267 (2020)

  18. Wu, S.L., Yang, Y.H.: The Jazz Transformer on the front line: exploring the shortcomings of AI-composed music through quantitative measures. In: Proceedings of the 21th International Society for Music Information Retrieval Conference, pp. 451–463 (2020)

  19. Donahue, C., Mao, H.H., Li ,Y.E., Cottrell, G. W., Mcauley, J.: LakhNES: improving multi-instrumental music generation with cross-domain pre-training. In: Music Information Retrieval Conference, pp. 685–692 (2019)

  20. Oore, S., Simon, I., Dieleman, S., et al.: This time with feeling: learning expressive musical performance. Neural Comput. Appl. 32(4), 955–967 (2020)

    Article  Google Scholar 

  21. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)

  22. Liu, F., Ren, X., Zhang, Z., Sun, X., Zou, Y.: Rethinking Skip Connection with Layer Normalization in Transformers and ResNets. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1324–1332 (2020)

  23. Zhang, B., Sennrich, R.: Root mean square layer normalization. NeurIPS. 13(27), 12360–12371 (2019)

    Google Scholar 

  24. Xiong, R., Yang, Y., He, D., Zheng, K., Liu, T.Y.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning, pp. 10524–10533 (2020)

  25. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 464–468 (2018)

  26. Huang, Y.S., Yang, Y.H.: Pop music Transformer: Beat-based modeling and generation of expressive pop piano compositions. In: ACM Multimedia, pp. 1180–1188 (2020)

  27. Ma, N., Zhang, X., Liu, M., et al.: Activate or not: learning customized activation. Comput. Vision Pattern Recogn. 21(5), 145–157 (2020)

    Google Scholar 

  28. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. 46(7), 122–127 (2014)

    Google Scholar 

  29. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(4), 623–656 (1948)

    Article  MathSciNet  Google Scholar 

  30. Levitin, D.J.: This is your brain on music: the science of a human obsession. Plume/Penguin, New York (2006)

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (No. 62377034, 11872036), the Shaanxi Key Science and Technology Innovation Team Project (No. 2022TD-26), the Fundamental Research Fund for the Central Universities (No. GK202101004, GK202205035), the Science and Technology Plan of Xi’an city (No. 22GXFW0020), Shaanxi Science and Technology Plan Project (No. 2023YBGY158), and the Key Laboratory of the Ministry of Culture and Tourism (No. 2023-02).

Author information

Authors and Affiliations

Authors

Contributions

YZ contributed to conceptualization, resources, validation, supervision, writing—review and editing. XL contributed to methodology, software, visualization, writing—original draft. QL performed methodology and writing—review and editing. XW and HY performed supervision, validation, writing—review and editing. YS was involved in writing—review and editing.

Corresponding author

Correspondence to Honghong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Lv, X., Li, Q. et al. An automatic music generation method based on RSCLN_Transformer network. Multimedia Systems 30, 4 (2024). https://doi.org/10.1007/s00530-023-01245-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-023-01245-0

Keywords

Navigation