skip to main content
10.1145/3474085.3478875acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
abstract

A Tutorial on AI Music Composition

Published:17 October 2021Publication History

ABSTRACT

AI music composition is one of the most attractive and important topics in artificial intelligence, music, and multimedia. The typical tasks in AI music composition include melody generation, song writing, accompaniment generation, arrangement, performance generation, timbre rendering, sound generation, and singing voice synthesis, which cover different modalities (e.g., symbolic music score, sound) and well match to the theme of ACM Multimedia. As the rapid development of artificial intelligence techniques such as content creation and deep learning, AI based music composition has achieved rapid progress, but still encountered a lot of challenges. A thorough introduction and review on the basics, the research progress, as well as how to address the challenges in AI music composition are timely and necessary for a broad audience working on artificial intelligence, music, and multimedia. In this tutorial, we will first introduce the background of AI music composition, including music basics and deep learning techniques for music composition. Then we will introduce AI music composition from two perspectives: 1) key components, which include music score generation, music performance generation, and music sound generation; 2) advanced topics, which include music structure/form/style/emotion modeling, timbre synthesis/transfer/mixing, etc. At last, we will point out some research challenges and future directions in AI music composition. This tutorial can serve both academic researchers and industry practitioners working on AI music composition.

References

  1. Richard Ashley. 2017. Musical Structure: Form. In The Routledge Companion to Music Cognition. Routledge, 179--190.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, and Ming Zhou. 2019. Neural melody composition from lyrics. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 499--511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jean-Pierre Briot, Gaëtan Hadjeres, and Francc ois-David Pachet. 2017. Deep learning techniques for music generation--a survey. arXiv preprint arXiv:1709.01620 (2017).Google ScholarGoogle Scholar
  4. Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, and Tie-Yan Liu. 2020. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776 (2020).Google ScholarGoogle Scholar
  5. Shuqi Dai, Zheng Zhang, and Gus G Xia. 2018. Music style transfer: A position paper. arXiv preprint arXiv:1803.06841 (2018).Google ScholarGoogle Scholar
  6. Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, and Francis Bach. 2018. Sing: Symbol-to-instrument neural generator. arXiv preprint arXiv:1810.09785 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. 2020. DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations. https://openreview.net/forum?id=B1x1ma4tDrGoogle ScholarGoogle Scholar
  9. Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. 2017. Neural audio synthesis of musical notes with wavenet autoencoders. In International Conference on Machine Learning. PMLR, 1068--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Daniel Taesoo Lee Harrison Gill and Nick Marwell. [n.d.]. Deep Learning in Musical Lyric Generation: An LSTM-Based Approach. ([n.,d.]).Google ScholarGoogle Scholar
  12. Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018b. Music Transformer: Generating Music with Long-Term Structure. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  13. Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger B Grosse. 2018a. Timbretron: A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620 (2018).Google ScholarGoogle Scholar
  14. Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia. 1180--1188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yun-Ning Hung, I Chiang, Yi-An Chen, Yi-Hsuan Yang, et almbox. 2019. Musical composition style transfer via disentangled timbre representations. arXiv preprint arXiv:1905.13567 (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dasaem Jeong, Taegyun Kwon, Yoojin Kim, and Juhan Nam. 2019. Score and performance features for rendering expressive music performances. In Proc. of Music Encoding Conf.Google ScholarGoogle Scholar
  17. Shulei Ji, Jing Luo, and Xinyu Yang. 2020. A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions. arXiv preprint arXiv:2011.06801 (2020).Google ScholarGoogle Scholar
  18. Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. 2019 b. icomposer: An automatic songwriting system for chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 84--88.Google ScholarGoogle Scholar
  19. Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, and Kyogu Lee. 2019 a. Adversarially Trained End-to-End Korean Singing Voice Synthesis System. Proc. Interspeech 2019 (2019), 2588--2592.Google ScholarGoogle Scholar
  20. Xia Liang, Junmin Wu, and Jing Cao. 2019. MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation. arXiv preprint arXiv:1909.03522 (2019).Google ScholarGoogle Scholar
  21. Peiling Lu, Jie Wu, Jian Luan, Xu Tan, and Li Zhou. 2020. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System. Proc. Interspeech 2020 (2020), 1306--1310.Google ScholarGoogle Scholar
  22. Xu Lu, Jie Wang, Bojin Zhuang, Shaojun Wang, and Jing Xiao. 2019. A syllable-structured, contextually-based conditionally generation of chinese lyrics. In Pacific Rim International Conference on Artificial Intelligence. Springer, 257--265.Google ScholarGoogle ScholarCross RefCross Ref
  23. Peter Manning. 2013. Electronic and computer music .Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Eita Nakamura, Yasuyuki Saito, and Kazuyoshi Yoshii. 2020. Statistical learning and estimation of piano fingering. Information Sciences, Vol. 517 (2020), 68--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sageev Oore, Ian Simon, Sander Dieleman, and Doug Eck. 2017. Learning to create piano performances. (2017).Google ScholarGoogle Scholar
  26. Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2020 a. Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM International Conference on Multimedia. 1198--1206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, and Tie-Yan Liu. 2020 b. Deepsinger: Singing voice synthesis with data mined from the web. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1979--1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, and Curtis Hawthorne. 2018. MusicVAE: Creating a palette for musical scores with machine learning, March 2018.Google ScholarGoogle Scholar
  29. Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, and Tao Qin. 2020. SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint. arXiv preprint arXiv:2012.05168 (2020).Google ScholarGoogle Scholar
  30. Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui, and Tomoyasu Nakano. 2018. A melody-conditioned lyrics language model. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 163--172.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jian Wu, Xiaoguang Liu, Xiaolin Hu, and Jun Zhu. 2020. PopMNet: Generating structured pop music melodies using neural networks. Artificial Intelligence, Vol. 286 (2020), 103303.Google ScholarGoogle ScholarCross RefCross Ref
  32. Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, and Tie-Yan Liu. 2021. DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling. In ACL 2021 .Google ScholarGoogle Scholar
  33. Yi Yu, Abhishek Srivastava, and Simon Canales. 2021. Conditional lstm-gan for melody generation from lyrics. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 17, 1 (2021), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu. 2021. MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training. In ACL 2021.Google ScholarGoogle Scholar
  35. Kun Zhao, Siqi Li, Juanjuan Cai, Hui Wang, and Jingling Wang. 2019. An emotional symbolic music generation system based on lstm networks. In 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2039--2043.Google ScholarGoogle Scholar
  36. Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. Xiaoice band: A melody and arrangement generation framework for pop music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2837--2846. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Tutorial on AI Music Composition

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '21: Proceedings of the 29th ACM International Conference on Multimedia
          October 2021
          5796 pages
          ISBN:9781450386517
          DOI:10.1145/3474085

          Copyright © 2021 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 October 2021

          Check for updates

          Qualifiers

          • abstract

          Acceptance Rates

          Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader