Abstract
Conditional sequence generation aims to instruct the generation procedure by conditioning the model with additional context information, which is an interesting research issue in AI and machine learning. Unfortunately, current state-of-the-art generative models for music fail to generate good melodies due to the discrete-valued property of music attributes. In this paper, we propose a novel conditional hybrid GAN (C-Hybrid-GAN) for melody generation from lyrics. Three discrete sequences corresponding to music attributes, namely pitch, duration, and rest, are separately generated by melody generation model conditioned on the same lyrics. Gumbel-Softmax is used to approximate the distribution of discrete-valued samples so as to directly generate discrete melody attributes. Most importantly, a hybrid structure is proposed, which contains three independent branches (each for one melody attribute) in the generator and one branch for distinguishing concatenated attributes in the discriminator. Relational memory core is exploited to model not only the dependency inside each sequence of attribute during the training of the generator, but also the consistency among three sequences of attributes during the training of the discriminator. Through extensive experiments using evaluation metrics, e.g., maximum mean discrepancy, average rest value, and MIDI number transition, we demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in melody generation from lyrics.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during and/or analyzed during the current study are available in [1] repository, https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation.
References
Yu Y, Srivastava A, Canales S (2021) Conditional lstm-gan for melody generation from lyrics. ACM Trans Multimed Comput Commun Appl 17(1):1–20
Kusner MJ, Hernández-Lobato JM (2016) Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv:1611.04051
Chi W, Kumar P, Yaddanapudi S, Suresh R, Isik U (2020) Generating music with a self-correcting non-chronological autoregressive model. arXiv:2008.08927
Fukayama S, Nakatsuma K, Sako S, Nishimoto T, Sagayama S (2010) Automatic song composition from the lyrics exploiting prosody of Japanese language. In: Sound and music computing conference
Monteith K, Martinez TR, Ventura D (2012) Automatic generation of melodic accompaniments for lyrics. In: International conference on computational creativity, pp 87–94
Ackerman M, Loker D (2017) Algorithmic songwriting with Alysia. In: Computational intelligence in music, sound, art and design, pp 1–16
Bao H, Huang S, Wei F, Cui L, Wu Y, Tan C, Piao S, Zhou M (2019) Neural melody composition from lyrics. In: International conference on natural language processing and Chinese computing, pp 499–511
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Liu A, Mei Y, Zhu Q, Zhu Z, Cai Z, Xie Z, Zhang M, Zhang S, Xiao J (2020) Lyrics2song: an automatic song generator for lyrics input. In: IEEE conference on multimedia information processing and retrieval, pp 388–391
Sheng Z, Song K, Tan X, Ren Y, Ye W, Zhang S, Qin T (2021) Songmass: automatic song writing with pre-training and alignment constraint. In: AAAI conference on artificial intelligence, vol 35, pp 13798–13805
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680
Zhou D, Zhang H, Li Q, Ma J, Xu X (2022) Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. In: IEEE transactions on multimedia, pp 1–15
Tuan Y-L, Lee H-Y (2019) Improving conditional sequence generative adversarial networks by stepwise evaluation. IEEE/ACM Trans Audio Speech Langu Process 27(4):788–798
Deng K, Fei T, Huang X, Peng Y (2019) Irc-gan: introspective recurrent convolutional gan for text-to-video generation. In: International joint conference on artificial intelligence, pp 2216–2222
Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: AAAI conference on artificial intelligence, vol 31, pp 2852–2858
Lin K, Li D, He X, Zhang Z, Sun M-T (2017) Adversarial ranking for language generation. Adv Neural Inf Process Syst 30:5998–6008
Guo J, Lu S, Cai H, Zhang W, Yu Y, Wang J (2018) Long text generation via adversarial training with leaked information. In: AAAI conference on artificial intelligence, vol 32, pp 5141–5148
Fedus W, Goodfellow I, Dai AM (2018) Maskgan: better text generation via filling in the______. arXiv:1801.07736
Zhang Y, Gan Z, Fan K, Chen Z, Henao R, Shen D, Carin L (2017) Adversarial feature matching for text generation. In: International conference on machine learning, pp 4006–4015
Chen L, Dai S, Tao C, Zhang H, Gan Z, Shen D, Zhang Y, Wang G, Zhang R, Carin L (2018) Adversarial text generation via feature-mover’s distance. Adv Neural Inf Process Syst 31:4671–4682
Zhao J, Kim Y, Zhang K, Rush A, LeCun Y (2018) Adversarially regularized autoencoders. In: International conference on machine learning, pp 5902–5911
Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations
Zhang N (2020) Learning adversarial transformer for symbolic music generation. In: IEEE transactions on neural networks and learning systems, pp 1–10
Muhamed A, Li L, Shi X, Yaddanapudi S, Chi W, Jackson D, Suresh R, Lipton ZC, Smola AJ (2021) Symbolic music generation with transformer-gans. In: AAAI conference on artificial intelligence, vol 35, pp 408–417
Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, Wierstra D, Vinyals O, Pascanu R, Lillicrap T (2018) Relational recurrent neural networks. Adv Neural Inf Process Syst 31:7310–7321
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Jang E, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax. arXiv:1611.01144
Maddison CJ, Mnih A, Teh YW (2016) The concrete distribution: a continuous relaxation of discrete random variables. arXiv:1611.00712
Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard gan. arXiv:1807.00734
Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: ACM SIGIR conference on research & development in information retrieval, pp 1097–1100
Smola A, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: International conference on algorithmic learning theory, pp 13–31
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Y., Zhang, Z., Duan, W. et al. Conditional hybrid GAN for melody generation from lyrics. Neural Comput & Applic 35, 3191–3202 (2023). https://doi.org/10.1007/s00521-022-07863-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07863-5