Skip to main content
Log in

Polyphonic music generation generative adversarial network with Markov decision process

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the process of polyphonic music creation, it is important to combine two or more independent melodies through technical treatment. However, due to the diversity of polyphonic music sequences and the limitations of neural networks, it is difficult to create chords or melodies beyond the training data. As the music sequence increases, the probability of the generator producing the same note will increase, which will destroy the coherence of the music. Therefore, this paper proposes a novel polyphonic music creation model, combining the ideas of the Markov decision process (MDP) and Monte Carlo tree search (MCTS) and improving the Wasserstein Generative Adversarial Network (WGAN) theory. Through the zero-sum game and conditional constraints between generator and discriminator, the model in this study is closer to the unconstrained creation of music, and the growth of music sequence will not affect music coherence. Experimental results show that the algorithm proposed here has a better effect on polyphonic music generation than the latest methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Agarwal S, Saxena V, Singal V, Aggarwal S (2018) LSTM based music generation with dataset preprocessing and reconstruction techniques

  2. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv, 1701.07875. [Online]. Available: https://arxiv.org/abs/1701.07875. Accessed 6 Dec 2017

  3. Bi CK et al (2019) Evacuation route recommendation using auto-encoder and Markov decision process. Appl Soft Comput 84. https://doi.org/10.1016/j.asoc.2019.105741

  4. Browne CB et al (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI Games 4:1–43. https://doi.org/10.1109/tciaig.2012.2186810

    Article  Google Scholar 

  5. Chen JN, Zhang C, Luo JT, Xie JF, Wan Y (2020) Driving maneuvers prediction based autonomous driving control by deep monte carlo tree search. IEEE Trans Veh Technol 69:7146–7158. https://doi.org/10.1109/tvt.2020.2991584

    Article  Google Scholar 

  6. Conklin D, Gasser M, Oertl S (2018) Creative chord sequence generation for electronic dance music. Appl Sci-Basel 8. https://doi.org/10.3390/app8091704

  7. Creswell A et al (2018) Generative adversarial networks an overview. IEEE Signal Process Mag 35:53–65. https://doi.org/10.1109/msp.2017.2765202

    Article  Google Scholar 

  8. Dean RT, Forth J (2020) Towards a deep improviser: a prototype deep learning post-tonal free music generator. Neural Comput Appl 32:969–979. https://doi.org/10.1007/s00521-018-3765-x

    Article  Google Scholar 

  9. Dong HW, Yang YH (2018) Convolutional generative adversarial networks with binary neurons for polyphonic music generation. Presented at the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 23–27

  10. Dong HW, Hsiao WY, Yang LC (2017) MuseGAN: demonstration of a convolutional GAN based model for generating multi-track piano-rolls. Presented at the 18th International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, 23–27

  11. Dong HW, Hsiao WY, Yang LC (2017) MuseGAN: multi-track sequential Generative Adversarial Networks for symbolic music generation and accompaniment. arXiv, 1709.06298. [Online]. Available: https://arxiv.org/abs/1709.06298. Accessed 24 Nov 2017

  12. Goienetxea I, Mendialdua I, Rodriguez I, Sierra B (2019)Statistics-based music generation approach considering both rhythm and melody coherence. IEEE Access 7:183365–183382. https://doi.org/10.1109/access.2019.2959696

    Article  Google Scholar 

  13. Goodfellow IJ et al (2014) Generative adversarial networks. Adv Neural Inf Process Syst 3:2672–2680

    Google Scholar 

  14. Hadjeres G, Nielsen F (2020) Anticipation-RNN: enforcing unary constraints in sequence generation, with application to interactive music generation. Neural Comput Appl 32:995–1005. https://doi.org/10.1007/s00521-018-3868-4

    Article  Google Scholar 

  15. Herremans D, Chew E, Morpheu S (2019) Generating structured music with constrained patterns and tension. IEEE Trans Affect Comput 10:510–523. https://doi.org/10.1109/taffc.2017.2737984

    Article  Google Scholar 

  16. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate sift. arXiv, 1502.03167. [Online]. Available: https://arxiv.org/abs/1502.03167. Accessed 2 Mar 2015

  17. Juan LI, Mingquan Z (2011) Music database construction based on MIDI melody feature extraction. Com Eng App 47(26):124–128

    Google Scholar 

  18. Lewis D, Schapire R, Callan J (1996) Training algorithms for linear text classifiers. ACM SIGIR-96, pp 298–306

  19. Link: https://pan.baidu.com/s/13VGjGvYCZ8gid0KnVXOZEA, Extraction code:50zj

  20. Liu JS, Chen R (1998) Sequential Monte Carlo methods for dynamic systems. J Am Stat Assoc 93:1032–1044. https://doi.org/10.2307/2669847

    Article  MathSciNet  MATH  Google Scholar 

  21. Lovejoy WS (1991) A survey of algorithmic methods for partially, observed Markov decision processes. Ann Oper Res 28:47–65. https://doi.org/10.1007/bf02055574

    Article  MathSciNet  MATH  Google Scholar 

  22. Ma JY, Yu W, Liang PW, Li C, Jiang JJ (2019) FusionGAN: A generative adversarial network for infrared and visible image fusion. Inform Fusion 48:11–26. https://doi.org/10.1016/j.inffus.2018.09.004

    Article  Google Scholar 

  23. Mangal S, Modak R, Joshi P (2019) LSTM based music generation system. arXiv, 1908.01080. [Online]. Available: https://arxiv.org/abs/1908.01080. Accessed 2 Aug 2019

  24. Mao HH, Shin T, Cottrell GW, IEEE (2018) In IEEE 12th International Conference on Semantic Computing IEEE International Conference on Semantic Computing, 377–382

  25. Mo F, Wang X, Li S, Qian H (2018) A music generation model for robotic composers. IEEE International Conference on Robotics and Biomimetics 18511690. https://doi.org/10.1109/ROBIO.2018.8665078

  26. Opitz J, Burst S (2021) Macro F1 and micro F1. arXiv, 1911.03347. [Online]. Available: https://arxiv.org/abs/1911.03347. Accessed 8 Feb 2021

  27. Parras J, Zazo S (2019) Learning attack mechanisms in Wireless Sensor Networks using Markov Decision Processes. Expert Syst Appl 122:376–387. https://doi.org/10.1016/j.eswa.2019.01.023

    Article  Google Scholar 

  28. Polo A, Sevillano X (2019) Musical vision: an interactive bio-inspired sonification tool to convert images into music. J Multimodal User Interfaces 13:231–243. https://doi.org/10.1007/s12193-018-0280-4

    Article  Google Scholar 

  29. Sehnke F et al (2010)Parameter-exploring policy gradients. Neural Netw 23:551–559. https://doi.org/10.1016/j.neunet.2009.12.004

    Article  Google Scholar 

  30. Sironi CF, Liu JL, Winands MHM (2020)Self-adaptive Monte Carlo tree search in general game playing. IEEE Trans Games 12:132–144. https://doi.org/10.1109/tg.2018.2884768

    Article  Google Scholar 

  31. Whorley RP, Conklin D (2016) Music generation from statistical models of harmony. J New Music Res 45:160–183. https://doi.org/10.1080/09298215.2016.1173708

    Article  Google Scholar 

  32. Xie Y, Franz E, Chu MY, Thuerey N (2018) tempoGAN: A temporally coherent, volumetric GAN for super-resolution fluid flow. ACM Trans Graphics 37. https://doi.org/10.1145/3197517.3201304

  33. Yang G et al (2018) Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Trans Med Imaging 37:1310–1321. https://doi.org/10.1109/tmi.2017.2785879

    Article  Google Scholar 

  34. Zhang H et al (2019) StackGAN plus plus: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41:1947–1962. https://doi.org/10.1109/tpami.2018.2856256

    Article  Google Scholar 

  35. Zhang YZ, Yao KJ, Zhang JD, Jiang F, Warren MA (2020) New Markov decision process based behavioral prediction system for airborne crews. IEEE Access 8:28021–28032. https://doi.org/10.1109/access.2019.2961239

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Guangdong Provincial Key Platform and Major Scientific Research Projects under Grant 2018GXJK138.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Huang, W.K. and Xue, Y.H.; methodology, Xue, Y.H. and Xu, Z.F; software, Xue, Y.H.; validation, Huang, W.K. and Xue, Y.H.; formal analysis, Xue, Y.H. and Xu, Z.F; investigation, Huang, W.K. and Xue, Y.H.; resources, Huang, W.K. and Xue, Y.H.; data curation, Huang, W.K. and Xue, Y.H.; writing—original draft preparation, Xue, Y.H. and Xu, Z.F; writing—review and editing, Huang, W.K. and Xu, Z.F.; visualization, Xue, Y.H. and Peng, G.L.; supervision, Huang, W.K. and Xu, Z.F.; project administration, Huang, W.K. and Wu, Y.; funding acquisition, Huang, W.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Wenkai Huang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, W., Xue, Y., Xu, Z. et al. Polyphonic music generation generative adversarial network with Markov decision process. Multimed Tools Appl 81, 29865–29885 (2022). https://doi.org/10.1007/s11042-022-12925-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12925-w

Keywords

Navigation