ABSTRACT
In the past five years, we have seen an increase in generative adversarial networks (GANs) and their applications for image generation. Due to the randomness and unpredictability of the structure of music, music generation is well suited to the use of GANs. Numerous studies have been published on music generation by using temporal GANs. However, few studies have focused on the relationships between melodies and chords, and the effects of latent space on time sequence.
We also propose a new method to implement latent structure on GANs for music generation. The main innovation of the proposed model is the use of new discriminator to recognize the time sequence of music and use of a pretrained beat generator to improve the quality of patterned melodies and chords. Results indicated that the pretrained model improved the quality of generated music.
- Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., WardeFarley D., Ozair S., Courville A., and Bengio Y. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, 2672--2680. Curran Associates, Inc.Google ScholarDigital Library
- Yang L.C., Chou S. Y., and Yang Y.H. 2017 Midinet: A convolutional generative adversarial network for symbolic-domain music generation using 1d and 2d conditions. In arXiv preprint, 324--331, arXiv: 1703.10847.Google Scholar
- Saito M., Matsumoto E., Saito S. Temporal Generative Adversarial Nets with Singular Value Clipping. 2017. IEEE International Conference on Computer Vision (ICCV), 2830--2839.Google Scholar
- Vondrick C., Pirsiavash H., and Torralba A. 2016. Generating videos with scene dynamics. In NIPS, 613--621.Google Scholar
- Dosovitskiy A., Springenberg J. T., and Brox T. 2014. Learning to generate chairs with convolutional neural networks. In arXiv preprint, 1538--1546, arXiv: 1411.5928.Google Scholar
- Mikolov T., Karafiát M., Burget L., Černocký J. and Khudanpur S. 2010. Recurrent neural network based language model. In Proceedings of Interspeech. 1045--1048.Google Scholar
- LeCun Y., Bottou L., Bengio Y., and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11, 2278--2324.Google ScholarCross Ref
- Mogren O. 2016. C-rnn-gan: Continuous recurrent neural networks with adversarial training. In arXiv preprint, arXiv:1611.09904.Google Scholar
- Dong H.W., Hsiao W.Y., Yang L.C., and Yang Y.H. 2018. MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks. In Proceeding AAAI Conference. Artificial Intelligence.Google Scholar
- Dong H.W., Hsiao W.Y., Yang L.C., and Yang Y.H. 2018. MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, 34--41, New Orleans, Louisiana, USA.Google Scholar
- Jang J.S. and Cheng W.N. 2002. Chord identification based on statistical methods and musical theory. MS Thesis, National Tsing Hua University, Taiwan, 16--34.Google Scholar
- Yang Y.J. and Ko P.C. Melody Style Classification and automatic accompaniment Using Melody And Chord Features. MS Thesis, Tatung University, Taiwan, 8--9, 2009.Google Scholar
- Radford A., Metz L., and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. In arXiv preprint, arXiv:1511.06434.Google Scholar
- Tulyakov S., Liu M. Y., Yang X., and Kautz J. 2017. Mocogan: Decomposing motion and content for video generation. In arXiv preprint, arXiv:1707.04993.Google Scholar
- Xie J., Zhu S. C., and Wu Y. N. 2017. Synthesizing dynamic patterns by spatial-temporal generative convnet. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7093--7101.Google Scholar
- Arjovsky M., Chintala S., and Bottou L. Wasserstein Generative Adversarial Networks. 2017. In Proceedings of the 34th International Conference on Machine Learning, 214--223.Google Scholar
- Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., and Courville A. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems 30 (NIPS 2017). arXiv:1704.00028.Google Scholar
- Karras T., Aila T., Laine S., and Lehtinen J. 2017. Progressive growing of gans for improved quality, stability, and variation. In arXiv preprint, arXiv:1710.10196.Google Scholar
- Zhang H., Xu T., Li H., Zhang S., Wang X., Huang X., and Metaxas D. 2016. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Jukedeck. Nottingham-dataset. Retrieved from https://github.com/jukedeck/nottingham-dataset.Google Scholar
- Ioffe S. and Szegedy C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In arXiv:1502.03167.Google Scholar
- Nair V. and Hinton G. E. 2010. Rectified linear units improve restricted boltzmann machines. In ICML, 807--814. Omnipress.Google Scholar
- Ba J. L., Kiros J. R., and Hinton G. E. 2016. Layer normalization. In arXiv preprint, arXiv:1607.06450.Google Scholar
- Xu B., Wang N., Chen T., and Li M. 2015. Empirical evaluation of rectified activations in convolutional network. In arXiv preprint, arXiv: 1505.00853Google Scholar
- Glorot X. and Bengio Y. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of AISTATS 2010, volume 9, 249--256.Google Scholar
Index Terms
- A Variant Model of TGAN for Music Generation
Recommendations
Pop Music Generation: From Melody to Multi-style Arrangement
Special Issue on KDD 2018, Regular Papers and Survey PaperMusic plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and ...
Self-attention generative adversarial networks applied to conditional music generation
AbstractThe task of audio and music generation in the waveform domain has become possible due to recent advances in deep learning. Generative Adversarial Networks (GANs) are a type of generative model that has achieved success in areas such as image, ...
Structure-Enhanced Pop Music Generation via Harmony-Aware Learning
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage ...
Comments