CST: a melody generation method based on ChatGPT and Structure Transformer

He, Ruhan; Liu, Ruixue; Peng, Tao; Hu, Xinrong

doi:10.1007/s00530-025-01802-9

CST: a melody generation method based on ChatGPT and Structure Transformer

Regular Paper
Published: 11 May 2025

Volume 31, article number 244, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Ruhan He¹,
Ruixue Liu²,
Tao Peng² &
…
Xinrong Hu²

125 Accesses
Explore all metrics

Abstract

Lyric-to-melody (L2M) generation has garnered significant attention in recent years. However, existing L2M systems face challenges in integrating user-specified and automatically generated lyrics, ensuring structural coherence in melodies, and achieving precise lyric-melody alignment. To alleviate these issues, this paper proposes Chat Structure Transformer (CST), an L2M system that combines ChatGPT with a Structure Transformer. Specifically, CST leverages ChatGPT’s advanced text generation capabilities to ensure thematic consistency and generate corresponding lyrics while also accommodating user-specified lyrics, thereby enhancing lyric generation flexibility. Meanwhile, CST incorporates the Structure Transformer, which introduces the StruAttention module for automatic recognition of musical structures and employs a customized loss function based on reinforcement learning principles. These components collectively enhance the structural coherence and lyric-melody alignment in the generated melodies. Both subjective and objective experimental evaluations demonstrate that CST produces higher-quality melodies compared to previous systems. Our code is available at https://github.com/liuasdeu/cst. Our music is available at https://github.com/liuasdeu/cst/tree/main/evaluation/cst.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Lyrics-Conditioned Neural Melody Generation

Data availability

The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.

References

Yu, W., Zhu, C., Li, Z., Hu, Z., Wang, Q., Ji, H., Jiang, M.: A survey of knowledge-enhanced text generation. ACM Comput Surv 54, 1–38 (2022)
Google Scholar
Iqbal, T., Qureshi, S.: The survey: text generation models in deep learning. J King Saud Univ Comput Inf Sci 34, 2515–2528 (2022)
Google Scholar
Xia, W., Yang, Y., Xue, J.-H., Wu, B.: TediGAN: text-guided diverse face image generation and manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2256–2265 (2021)
Gafni, O., Polyak, A., Ashual, O., Sheynin, S., Parikh, D., Taigman, Y.: Make-a-scene: scene-based text-to-image generation with human priors. In: European Conference on Computer Vision, pp. 89–106. Springer (2022)
Ju, Z., Lu, P., Tan, X., Wang, R., Zhang, C., Wu, S., Zhang, K., Li, X.-Y., Qin, T., Liu, T.-Y.: TeleMelody: lyric-to-melody generation with a template-based two-stage method. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5426–5437 (2022)
Lv, A., Tan, X., Qin, T., Liu, T.-Y., Yan, R.: Re-creation of creations: a new paradigm for lyric-to-melody generation (2022). arXiv preprint arXiv:2208.05697
Sheng, Z., Song, K., Tan, X., Ren, Y., Ye, W., Zhang, S., Qin, T.: SongMASS: automatic song writing with pre-training and alignment constraint. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13798–13805 (2021)
Wang, J.-C., Kosta, K., Smith, J.B., Zhou, S., et al.: Modeling the rhythm from lyrics for melody generation of pop songs. In: ISMIR 2022 Hybrid Conference (2022)
Teubner, T., Flath, C.M., Weinhardt, C., Aalst, W., Hinz, O.: Welcome to the era of ChatGPT et al.: the prospects of large language models. Bus. Inf. Syst. Eng. 65, 95–101 (2023)
Google Scholar
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: LLaMA: open and efficient foundation language models (2023). arXiv preprint arXiv:2302.13971
Fischer, S., Rossetto, F., Gemmell, C., Ramsay, A., Mackie, I., Zubel, P., Tecklenburg, N., Dalton, J.: Open assistant toolkit–version 2 (2024). arXiv preprint arXiv:2403.00586
Jin, C., Zhu, R., Zhu, Z., Yang, L., Yang, M., Luo, J.: MtArtGPT: a multi-task art generation system with pre-trained transformer. IEEE Trans. Circuits Syst. Video Technol. 34(8), 6901–6912 (2024)
Google Scholar
Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., Auli, M.: fairseq: a fast, extensible toolkit for sequence modeling. In: Proceedings of the 2019 Conference of the North. Association for Computational Linguistics (2019)
Pinkerton, R.C.: Information theory and melody. Sci. Am. 194, 77–87 (1956)
Google Scholar
Brooks, F.P., Hopkins, A., Neumann, P.G., Wright, W.V.: An experiment in musical composition. IRE Trans. Electron. Comput. EC–6, 175–182 (1957)
Google Scholar
Monteith, K., Martinez, T.R., Ventura, D.: Automatic generation of melodic accompaniments for lyrics. In: ICCC, pp. 87–94 (2012)
Fukayama, S., Nakatsuma, K., Sako, S., Nishimoto, T., Sagayama, S.: Automatic song composition from the lyrics exploiting prosody of the Japanese language. In: Proc. 7th Sound and Music Computing Conference (SMC), pp. 299–302 (2010)
Long, C., Wong, R.C.-W., Sze, R.K.W.: T-Music: a melody composer based on frequent pattern mining. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1332–1335. IEEE (2013)
Dong, S., Wang, P., Abbas, K.: A survey on deep learning and its applications. Comput. Sci. Rev. 40, 100379 (2021)
MathSciNet Google Scholar
Janiesch, C., Zschech, P., Heinrich, K.: Machine learning and deep learning. Electron. Markets 31(3), 685–695 (2021)
Google Scholar
Bretan, M., Weinberg, G., Heck, L.: A unit selection methodology for music generation using deep neural networks (2016). arXiv preprint arXiv:1612.03789
Zou, Y., Zou, P., Zhao, Y., Zhang, K., Zhang, R., Wang, X.: Melons: generating melody with long-term structure using transformers and structure graph. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 191–195. IEEE (2022)
Su, Y., Han, R., Wu, X., Zhang, Y., Li, Y.: Folk melody generation based on CNN-BIGRU and self-attention. In: 2022 4th International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 363–368 (2022). IEEE
Lu, P., Tan, X., Yu, B., Qin, T., Zhao, S., Liu, T.-Y.: MeloForm: generating melody with musical form based on expert systems and neural networks. In: ISMIR 2022 Hybrid Conference (2022)
Duan, W., Zhang, Z., Yu, Y., Oyama, K.: Interpretable melody generation from lyrics with discrete-valued adversarial training. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6973–6975 (2022)
Yu, Y., Zhang, Z., Duan, W., Srivastava, A., Shah, R., Ren, Y.: Conditional hybrid GAN for melody generation from lyrics. Neural Comput. Appl. 35, 3191–3202 (2023)
Google Scholar
Bao, H., Huang, S., Wei, F., Cui, L., Wu, Y., Tan, C., Piao, S., Zhou, M.: Neural melody composition from lyrics. In: Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part I 8, pp. 499–511. Springer (2019)
Yu, Y., Srivastava, A., Canales, S.: Conditional LSTM-GAN for melody generation from lyrics. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 17, 1–20 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. PhD thesis, Columbia University (2016)
Guo, R., Simpson, I., Magnusson, T., Kiefer, C., Herremans, D.: A variational autoencoder for music generation controlled by tonal tension (2020). arXiv preprint arXiv:2010.06230
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980
Haji, S.H., Abdulazeez, A.M.: Comparison of optimization techniques based on gradient descent algorithm: a review. PalArch’s J. Archaeol. Egypt Egyptol. 18(4), 2715–2743 (2021)
Google Scholar
Gold, B.P., Pearce, M.T., Mas-Herrero, E., Dagher, A., Zatorre, R.J.: Predictability and uncertainty in the pleasure of music: a reward for learning? J. Neurosci. 39(47), 9397–9409 (2019)
Google Scholar
Jin, C., Luo, C., Yan, M., Zhao, G., Zhang, G., Zhang, S.: Weakening the dominant role of text: CMOSI dataset and multimodal semantic enhancement network. IEEE Trans. Neural Netw. Learn. Syst. 36, 222–236 (2023)
Google Scholar
Jin, C., Liu, X., Zhao, Y., Zhu, Y., Wang, J., Wang, H.: ViolinBot: a framework for imitation learning of violin bowing using fuzzy logic and PCA. IEEE Trans. Fuzzy Syst. 32, 5005–5017 (2024)
Google Scholar
Wang, H., Zhang, X., Iida, F.: Human–robot cooperative piano playing with learning-based real-time music accompaniment. IEEE Trans. Robot. 40, 4650–4669 (2024)
Google Scholar

Download references

Author information

Authors and Affiliations

Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion, Wuhan, 430000, China
Ruhan He
School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430000, China
Ruixue Liu, Tao Peng & Xinrong Hu

Authors

Ruhan He
View author publications
Search author on:PubMed Google Scholar
Ruixue Liu
View author publications
Search author on:PubMed Google Scholar
Tao Peng
View author publications
Search author on:PubMed Google Scholar
Xinrong Hu
View author publications
Search author on:PubMed Google Scholar

Contributions

R.H. and R.L. proposed the main ideas and conducted the related experiments, while T.P. and X.H. supervised and guided the research project. R.H. and R.L. wrote the manuscript, and T.P. and X.H. revised the manuscript and organized the figures and tables. All authors reviewed the manuscript.

Corresponding author

Correspondence to Ruixue Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, R., Liu, R., Peng, T. et al. CST: a melody generation method based on ChatGPT and Structure Transformer. Multimedia Systems 31, 244 (2025). https://doi.org/10.1007/s00530-025-01802-9

Download citation

Received: 30 June 2024
Accepted: 15 April 2025
Published: 11 May 2025
DOI: https://doi.org/10.1007/s00530-025-01802-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CST: a melody generation method based on ChatGPT and Structure Transformer

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Lyrics-Conditioned Neural Melody Generation

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now