Conditional hybrid GAN for melody generation from lyrics

Yu, Yi; Zhang, Zhe; Duan, Wei; Srivastava, Abhishek; Shah, Rajiv; Ren, Yi

doi:10.1007/s00521-022-07863-5

Conditional hybrid GAN for melody generation from lyrics

Original Article
Published: 08 October 2022

Volume 35, pages 3191–3202, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yi Yu ORCID: orcid.org/0000-0002-0294-6620¹,
Zhe Zhang¹,
Wei Duan¹,
Abhishek Srivastava²,
Rajiv Shah² &
…
Yi Ren³

701 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Conditional sequence generation aims to instruct the generation procedure by conditioning the model with additional context information, which is an interesting research issue in AI and machine learning. Unfortunately, current state-of-the-art generative models for music fail to generate good melodies due to the discrete-valued property of music attributes. In this paper, we propose a novel conditional hybrid GAN (C-Hybrid-GAN) for melody generation from lyrics. Three discrete sequences corresponding to music attributes, namely pitch, duration, and rest, are separately generated by melody generation model conditioned on the same lyrics. Gumbel-Softmax is used to approximate the distribution of discrete-valued samples so as to directly generate discrete melody attributes. Most importantly, a hybrid structure is proposed, which contains three independent branches (each for one melody attribute) in the generator and one branch for distinguishing concatenated attributes in the discriminator. Relational memory core is exploited to model not only the dependency inside each sequence of attribute during the training of the generator, but also the consistency among three sequences of attributes during the training of the discriminator. Through extensive experiments using evaluation metrics, e.g., maximum mean discrepancy, average rest value, and MIDI number transition, we demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in melody generation from lyrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Lyrics-Conditioned Neural Melody Generation

Semantic dependency network for lyrics generation from melody

Article 09 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during and/or analyzed during the current study are available in [1] repository, https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation.

Notes

https://drive.google.com/file/d/1ozUVA5suGAERP9sgdc5q3NKkRXj3jkhE/view.

References

Yu Y, Srivastava A, Canales S (2021) Conditional lstm-gan for melody generation from lyrics. ACM Trans Multimed Comput Commun Appl 17(1):1–20
Article Google Scholar
Kusner MJ, Hernández-Lobato JM (2016) Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv:1611.04051
Chi W, Kumar P, Yaddanapudi S, Suresh R, Isik U (2020) Generating music with a self-correcting non-chronological autoregressive model. arXiv:2008.08927
Fukayama S, Nakatsuma K, Sako S, Nishimoto T, Sagayama S (2010) Automatic song composition from the lyrics exploiting prosody of Japanese language. In: Sound and music computing conference
Monteith K, Martinez TR, Ventura D (2012) Automatic generation of melodic accompaniments for lyrics. In: International conference on computational creativity, pp 87–94
Ackerman M, Loker D (2017) Algorithmic songwriting with Alysia. In: Computational intelligence in music, sound, art and design, pp 1–16
Bao H, Huang S, Wei F, Cui L, Wu Y, Tan C, Piao S, Zhou M (2019) Neural melody composition from lyrics. In: International conference on natural language processing and Chinese computing, pp 499–511
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Liu A, Mei Y, Zhu Q, Zhu Z, Cai Z, Xie Z, Zhang M, Zhang S, Xiao J (2020) Lyrics2song: an automatic song generator for lyrics input. In: IEEE conference on multimedia information processing and retrieval, pp 388–391
Sheng Z, Song K, Tan X, Ren Y, Ye W, Zhang S, Qin T (2021) Songmass: automatic song writing with pre-training and alignment constraint. In: AAAI conference on artificial intelligence, vol 35, pp 13798–13805
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680
Google Scholar
Zhou D, Zhang H, Li Q, Ma J, Xu X (2022) Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. In: IEEE transactions on multimedia, pp 1–15
Tuan Y-L, Lee H-Y (2019) Improving conditional sequence generative adversarial networks by stepwise evaluation. IEEE/ACM Trans Audio Speech Langu Process 27(4):788–798
Article Google Scholar
Deng K, Fei T, Huang X, Peng Y (2019) Irc-gan: introspective recurrent convolutional gan for text-to-video generation. In: International joint conference on artificial intelligence, pp 2216–2222
Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: AAAI conference on artificial intelligence, vol 31, pp 2852–2858
Lin K, Li D, He X, Zhang Z, Sun M-T (2017) Adversarial ranking for language generation. Adv Neural Inf Process Syst 30:5998–6008
Google Scholar
Guo J, Lu S, Cai H, Zhang W, Yu Y, Wang J (2018) Long text generation via adversarial training with leaked information. In: AAAI conference on artificial intelligence, vol 32, pp 5141–5148
Fedus W, Goodfellow I, Dai AM (2018) Maskgan: better text generation via filling in the______. arXiv:1801.07736
Zhang Y, Gan Z, Fan K, Chen Z, Henao R, Shen D, Carin L (2017) Adversarial feature matching for text generation. In: International conference on machine learning, pp 4006–4015
Chen L, Dai S, Tao C, Zhang H, Gan Z, Shen D, Zhang Y, Wang G, Zhang R, Carin L (2018) Adversarial text generation via feature-mover’s distance. Adv Neural Inf Process Syst 31:4671–4682
Google Scholar
Zhao J, Kim Y, Zhang K, Rush A, LeCun Y (2018) Adversarially regularized autoencoders. In: International conference on machine learning, pp 5902–5911
Nie W, Narodytska N, Patel A (2018) Relgan: relational generative adversarial networks for text generation. In: International conference on learning representations
Zhang N (2020) Learning adversarial transformer for symbolic music generation. In: IEEE transactions on neural networks and learning systems, pp 1–10
Muhamed A, Li L, Shi X, Yaddanapudi S, Chi W, Jackson D, Suresh R, Lipton ZC, Smola AJ (2021) Symbolic music generation with transformer-gans. In: AAAI conference on artificial intelligence, vol 35, pp 408–417
Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, Wierstra D, Vinyals O, Pascanu R, Lillicrap T (2018) Relational recurrent neural networks. Adv Neural Inf Process Syst 31:7310–7321
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Google Scholar
Jang E, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax. arXiv:1611.01144
Maddison CJ, Mnih A, Teh YW (2016) The concrete distribution: a continuous relaxation of discrete random variables. arXiv:1611.00712
Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard gan. arXiv:1807.00734
Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: ACM SIGIR conference on research & development in information retrieval, pp 1097–1100
Smola A, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: International conference on algorithmic learning theory, pp 13–31
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

Download references

Author information

Authors and Affiliations

Digital Content and Media Sciences Research Division, National Institute of Informatics and SOKENDAI, Chiyoda-ku, Tokyo, 101-8430, Japan
Yi Yu, Zhe Zhang & Wei Duan
Indian Institute of Technology Delhi, Delhi, 110016, India
Abhishek Srivastava & Rajiv Shah
Zhejiang University, Hangzhou, 310027, Zhejiang, China
Yi Ren

Authors

Yi Yu
View author publications
You can also search for this author inPubMed Google Scholar
Zhe Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Duan
View author publications
You can also search for this author inPubMed Google Scholar
Abhishek Srivastava
View author publications
You can also search for this author inPubMed Google Scholar
Rajiv Shah
View author publications
You can also search for this author inPubMed Google Scholar
Yi Ren
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yi Yu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhang, Z., Duan, W. et al. Conditional hybrid GAN for melody generation from lyrics. Neural Comput & Applic 35, 3191–3202 (2023). https://doi.org/10.1007/s00521-022-07863-5

Download citation

Received: 06 March 2022
Accepted: 21 September 2022
Published: 08 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00521-022-07863-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional hybrid GAN for melody generation from lyrics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Lyrics-Conditioned Neural Melody Generation

Semantic dependency network for lyrics generation from melody

Explore related subjects

Data Availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now