skip to main content
10.1145/3399871.3399888acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasseConference Proceedingsconference-collections
research-article

A Variant Model of TGAN for Music Generation

Authors Info & Claims
Published:03 July 2020Publication History

ABSTRACT

In the past five years, we have seen an increase in generative adversarial networks (GANs) and their applications for image generation. Due to the randomness and unpredictability of the structure of music, music generation is well suited to the use of GANs. Numerous studies have been published on music generation by using temporal GANs. However, few studies have focused on the relationships between melodies and chords, and the effects of latent space on time sequence.

We also propose a new method to implement latent structure on GANs for music generation. The main innovation of the proposed model is the use of new discriminator to recognize the time sequence of music and use of a pretrained beat generator to improve the quality of patterned melodies and chords. Results indicated that the pretrained model improved the quality of generated music.

References

  1. Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., WardeFarley D., Ozair S., Courville A., and Bengio Y. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, 2672--2680. Curran Associates, Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yang L.C., Chou S. Y., and Yang Y.H. 2017 Midinet: A convolutional generative adversarial network for symbolic-domain music generation using 1d and 2d conditions. In arXiv preprint, 324--331, arXiv: 1703.10847.Google ScholarGoogle Scholar
  3. Saito M., Matsumoto E., Saito S. Temporal Generative Adversarial Nets with Singular Value Clipping. 2017. IEEE International Conference on Computer Vision (ICCV), 2830--2839.Google ScholarGoogle Scholar
  4. Vondrick C., Pirsiavash H., and Torralba A. 2016. Generating videos with scene dynamics. In NIPS, 613--621.Google ScholarGoogle Scholar
  5. Dosovitskiy A., Springenberg J. T., and Brox T. 2014. Learning to generate chairs with convolutional neural networks. In arXiv preprint, 1538--1546, arXiv: 1411.5928.Google ScholarGoogle Scholar
  6. Mikolov T., Karafiát M., Burget L., Černocký J. and Khudanpur S. 2010. Recurrent neural network based language model. In Proceedings of Interspeech. 1045--1048.Google ScholarGoogle Scholar
  7. LeCun Y., Bottou L., Bengio Y., and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11, 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mogren O. 2016. C-rnn-gan: Continuous recurrent neural networks with adversarial training. In arXiv preprint, arXiv:1611.09904.Google ScholarGoogle Scholar
  9. Dong H.W., Hsiao W.Y., Yang L.C., and Yang Y.H. 2018. MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks. In Proceeding AAAI Conference. Artificial Intelligence.Google ScholarGoogle Scholar
  10. Dong H.W., Hsiao W.Y., Yang L.C., and Yang Y.H. 2018. MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, 34--41, New Orleans, Louisiana, USA.Google ScholarGoogle Scholar
  11. Jang J.S. and Cheng W.N. 2002. Chord identification based on statistical methods and musical theory. MS Thesis, National Tsing Hua University, Taiwan, 16--34.Google ScholarGoogle Scholar
  12. Yang Y.J. and Ko P.C. Melody Style Classification and automatic accompaniment Using Melody And Chord Features. MS Thesis, Tatung University, Taiwan, 8--9, 2009.Google ScholarGoogle Scholar
  13. Radford A., Metz L., and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. In arXiv preprint, arXiv:1511.06434.Google ScholarGoogle Scholar
  14. Tulyakov S., Liu M. Y., Yang X., and Kautz J. 2017. Mocogan: Decomposing motion and content for video generation. In arXiv preprint, arXiv:1707.04993.Google ScholarGoogle Scholar
  15. Xie J., Zhu S. C., and Wu Y. N. 2017. Synthesizing dynamic patterns by spatial-temporal generative convnet. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7093--7101.Google ScholarGoogle Scholar
  16. Arjovsky M., Chintala S., and Bottou L. Wasserstein Generative Adversarial Networks. 2017. In Proceedings of the 34th International Conference on Machine Learning, 214--223.Google ScholarGoogle Scholar
  17. Gulrajani I., Ahmed F., Arjovsky M., Dumoulin V., and Courville A. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems 30 (NIPS 2017). arXiv:1704.00028.Google ScholarGoogle Scholar
  18. Karras T., Aila T., Laine S., and Lehtinen J. 2017. Progressive growing of gans for improved quality, stability, and variation. In arXiv preprint, arXiv:1710.10196.Google ScholarGoogle Scholar
  19. Zhang H., Xu T., Li H., Zhang S., Wang X., Huang X., and Metaxas D. 2016. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  20. Jukedeck. Nottingham-dataset. Retrieved from https://github.com/jukedeck/nottingham-dataset.Google ScholarGoogle Scholar
  21. Ioffe S. and Szegedy C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In arXiv:1502.03167.Google ScholarGoogle Scholar
  22. Nair V. and Hinton G. E. 2010. Rectified linear units improve restricted boltzmann machines. In ICML, 807--814. Omnipress.Google ScholarGoogle Scholar
  23. Ba J. L., Kiros J. R., and Hinton G. E. 2016. Layer normalization. In arXiv preprint, arXiv:1607.06450.Google ScholarGoogle Scholar
  24. Xu B., Wang N., Chen T., and Li M. 2015. Empirical evaluation of rectified activations in convolutional network. In arXiv preprint, arXiv: 1505.00853Google ScholarGoogle Scholar
  25. Glorot X. and Bengio Y. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of AISTATS 2010, volume 9, 249--256.Google ScholarGoogle Scholar

Index Terms

  1. A Variant Model of TGAN for Music Generation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ASSE '20: Proceedings of the 2020 Asia Service Sciences and Software Engineering Conference
      May 2020
      163 pages
      ISBN:9781450377102
      DOI:10.1145/3399871

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 July 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader