Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Srivastava, Abhishek; Duan, Wei; Shah, Rajiv Ratn; Wu, Jianming; Tang, Suhua; Li, Wei; Yu, Yi

doi:10.1007/978-3-030-98358-1_45

Abhishek Srivastava¹⁶,
Wei Duan¹⁵,
Rajiv Ratn Shah¹⁶,
Jianming Wu¹⁷,
Suhua Tang¹⁸,
Wei Li¹⁹ &
…
Yi Yu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

International Conference on Multimedia Modeling

2100 Accesses
3 Citations

Abstract

With the availability of paired lyrics-melody dataset and advancements of artificial intelligence techniques, research on melody generation conditioned on lyrics has become possible. In this work, for melody generation, we propose a novel architecture, Three Branch Conditional (TBC) LSTM-GAN conditioned on lyrics which is composed of a LSTM-based generator and discriminator respectively. The generative model is composed of three branches of identical and independent lyrics-conditioned LSTM-based sub-networks, each responsible for generating an attribute of a melody. For discrete-valued sequence generation, we leverage the Gumbel-Softmax technique to train GANs. Through extensive experiments, we show that our proposed model generates tuneful and plausible melodies from the given lyrics and outperforms the current state-of-the-art models quantitatively as well as qualitatively.

A. Srivastava—was involved in this work during his internship at the National Institute of Informatics, Tokyo, Japan.

The second author has the same contribution as the first author for this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ackerman, M., Loker, D.: Algorithmic songwriting with ALYSIA. CoRR abs/1612.01058 (2016). http://arxiv.org/abs/1612.01058
Bao, H., et al.: Neural melody composition from lyrics. CoRR abs/1809.04318 (2018). http://arxiv.org/abs/1809.04318
Fedus, W., Goodfellow, I.J., Dai, A.M.: Maskgan: better text generation via filling in the. ArXiv abs/1801.07736 (2018)
Google Scholar
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., Wang, J.: Long text generation via adversarial training with leaked information. ArXiv abs/1709.08624 (2018)
Google Scholar
Hiller, Jr., L.A., Isaacson, L.M.: Musical composition with a high-speed digital computer. J. Audio Eng. Soc. 6(3), 154–160 (1958). http://www.aes.org/e-lib/browse.cfm?elib=231
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax (2016)
Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard gan. ArXiv abs/1807.00734 (2019)
Google Scholar
Lin, K., Li, D., He, X., Zhang, Z., Sun, M.T.: Adversarial ranking for language generation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 3158–3168. Curran Associates Inc., Red Hook (2017)
Google Scholar
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables (2016)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014). http://arxiv.org/abs/1411.1784
Nie, W., Narodytska, N., Patel, A.B.: Relgan: relational generative adversarial networks for text generation. In: ICLR (2019)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation, October 2002. https://doi.org/10.3115/1073083.1073135
Rodriguez, J.D.F., Vico, F.J.: AI methods in algorithmic composition: a comprehensive survey. CoRR abs/1402.0585 (2014). http://arxiv.org/abs/1402.0585
Semeniuta, S., Severyn, A., Gelly, S.: On accurate evaluation of gans for language generation (2018)
Google Scholar
Sutton, R., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, February 2000
Google Scholar
Wiggins, G.A.: A preliminary framework for description, analysis and comparison of creative systems. J. Knowl. Based Syst. 19(7), 449–458 (2006)
Article Google Scholar
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2852–2858. AAAI Press (2017)
Google Scholar
Yu, Yi., Harscoët, Florian, Canales, Simon, Reddy M, Gurunath, Tang, Suhua, Jiang, Junjun: Lyrics-conditioned neural melody generation. In: Ro, Yong Man, Cheng, Wen-Huang., Kim, Junmo, Chu, Wei-Ta., Cui, Peng, Choi, Jung-Woo., Hu, Min-Chun., De Neve, Wesley (eds.) MMM 2020. LNCS, vol. 11962, pp. 709–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_58
Chapter Google Scholar
Yu, Y., Srivastava, A., Canales, S.: Conditional lstm-gan for melody generation from lyrics. ACM Trans. Multimedia Comput. Commun. Appl. (2020)
Google Scholar
Yu, Y., Tang, S., Raposo, F., Chen, L.: Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 15(1), February 2019. https://doi.org/10.1145/3281746
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., Carin, L.: Adversarial feature matching for text generation. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. pp. 4006–4015. ICML’17, JMLR.org (2017)
Google Scholar
Zhao, J.J., Kim, Y., Zhang, K., Rush, A.M., LeCun, Y.: Adversarially regularized autoencoders. In: ICML (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Content and Media Sciences Research Division, National Institute of Informatics, SOKENDAI, Tokyo, Japan
Wei Duan & Yi Yu
MDMA Lab, Indraprastha Institute of Information Technology, Delhi, India
Abhishek Srivastava & Rajiv Ratn Shah
KDDI Research, Inc, Tokyo, Japan
Jianming Wu
Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
Suhua Tang
School of Computer Science and Technology, Fudan University, Shanghai, China
Wei Li

Authors

Abhishek Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Wei Duan
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Ratn Shah
View author publications
You can also search for this author in PubMed Google Scholar
Jianming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Suhua Tang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Yu .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srivastava, A. et al. (2022). Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-98358-1_45
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics