ABSTRACT
AI music composition is one of the most attractive and important topics in artificial intelligence, music, and multimedia. The typical tasks in AI music composition include melody generation, song writing, accompaniment generation, arrangement, performance generation, timbre rendering, sound generation, and singing voice synthesis, which cover different modalities (e.g., symbolic music score, sound) and well match to the theme of ACM Multimedia. As the rapid development of artificial intelligence techniques such as content creation and deep learning, AI based music composition has achieved rapid progress, but still encountered a lot of challenges. A thorough introduction and review on the basics, the research progress, as well as how to address the challenges in AI music composition are timely and necessary for a broad audience working on artificial intelligence, music, and multimedia. In this tutorial, we will first introduce the background of AI music composition, including music basics and deep learning techniques for music composition. Then we will introduce AI music composition from two perspectives: 1) key components, which include music score generation, music performance generation, and music sound generation; 2) advanced topics, which include music structure/form/style/emotion modeling, timbre synthesis/transfer/mixing, etc. At last, we will point out some research challenges and future directions in AI music composition. This tutorial can serve both academic researchers and industry practitioners working on AI music composition.
- Richard Ashley. 2017. Musical Structure: Form. In The Routledge Companion to Music Cognition. Routledge, 179--190.Google ScholarCross Ref
- Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, and Ming Zhou. 2019. Neural melody composition from lyrics. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 499--511.Google ScholarDigital Library
- Jean-Pierre Briot, Gaëtan Hadjeres, and Francc ois-David Pachet. 2017. Deep learning techniques for music generation--a survey. arXiv preprint arXiv:1709.01620 (2017).Google Scholar
- Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, and Tie-Yan Liu. 2020. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776 (2020).Google Scholar
- Shuqi Dai, Zheng Zhang, and Gus G Xia. 2018. Music style transfer: A position paper. arXiv preprint arXiv:1803.06841 (2018).Google Scholar
- Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, and Francis Bach. 2018. Sing: Symbol-to-instrument neural generator. arXiv preprint arXiv:1810.09785 (2018). Google ScholarDigital Library
- Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
- Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. 2020. DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations. https://openreview.net/forum?id=B1x1ma4tDrGoogle Scholar
- Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. 2017. Neural audio synthesis of musical notes with wavenet autoencoders. In International Conference on Machine Learning. PMLR, 1068--1077. Google ScholarDigital Library
- Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Daniel Taesoo Lee Harrison Gill and Nick Marwell. [n.d.]. Deep Learning in Musical Lyric Generation: An LSTM-Based Approach. ([n.,d.]).Google Scholar
- Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018b. Music Transformer: Generating Music with Long-Term Structure. In International Conference on Learning Representations.Google Scholar
- Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger B Grosse. 2018a. Timbretron: A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620 (2018).Google Scholar
- Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia. 1180--1188. Google ScholarDigital Library
- Yun-Ning Hung, I Chiang, Yi-An Chen, Yi-Hsuan Yang, et almbox. 2019. Musical composition style transfer via disentangled timbre representations. arXiv preprint arXiv:1905.13567 (2019). Google ScholarDigital Library
- Dasaem Jeong, Taegyun Kwon, Yoojin Kim, and Juhan Nam. 2019. Score and performance features for rendering expressive music performances. In Proc. of Music Encoding Conf.Google Scholar
- Shulei Ji, Jing Luo, and Xinyu Yang. 2020. A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions. arXiv preprint arXiv:2011.06801 (2020).Google Scholar
- Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. 2019 b. icomposer: An automatic songwriting system for chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 84--88.Google Scholar
- Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, and Kyogu Lee. 2019 a. Adversarially Trained End-to-End Korean Singing Voice Synthesis System. Proc. Interspeech 2019 (2019), 2588--2592.Google Scholar
- Xia Liang, Junmin Wu, and Jing Cao. 2019. MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation. arXiv preprint arXiv:1909.03522 (2019).Google Scholar
- Peiling Lu, Jie Wu, Jian Luan, Xu Tan, and Li Zhou. 2020. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System. Proc. Interspeech 2020 (2020), 1306--1310.Google Scholar
- Xu Lu, Jie Wang, Bojin Zhuang, Shaojun Wang, and Jing Xiao. 2019. A syllable-structured, contextually-based conditionally generation of chinese lyrics. In Pacific Rim International Conference on Artificial Intelligence. Springer, 257--265.Google ScholarCross Ref
- Peter Manning. 2013. Electronic and computer music .Oxford University Press. Google ScholarDigital Library
- Eita Nakamura, Yasuyuki Saito, and Kazuyoshi Yoshii. 2020. Statistical learning and estimation of piano fingering. Information Sciences, Vol. 517 (2020), 68--85.Google ScholarDigital Library
- Sageev Oore, Ian Simon, Sander Dieleman, and Doug Eck. 2017. Learning to create piano performances. (2017).Google Scholar
- Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2020 a. Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM International Conference on Multimedia. 1198--1206. Google ScholarDigital Library
- Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, and Tie-Yan Liu. 2020 b. Deepsinger: Singing voice synthesis with data mined from the web. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1979--1989.Google ScholarDigital Library
- Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, and Curtis Hawthorne. 2018. MusicVAE: Creating a palette for musical scores with machine learning, March 2018.Google Scholar
- Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, and Tao Qin. 2020. SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint. arXiv preprint arXiv:2012.05168 (2020).Google Scholar
- Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui, and Tomoyasu Nakano. 2018. A melody-conditioned lyrics language model. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 163--172.Google ScholarCross Ref
- Jian Wu, Xiaoguang Liu, Xiaolin Hu, and Jun Zhu. 2020. PopMNet: Generating structured pop music melodies using neural networks. Artificial Intelligence, Vol. 286 (2020), 103303.Google ScholarCross Ref
- Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, and Tie-Yan Liu. 2021. DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling. In ACL 2021 .Google Scholar
- Yi Yu, Abhishek Srivastava, and Simon Canales. 2021. Conditional lstm-gan for melody generation from lyrics. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 17, 1 (2021), 1--20. Google ScholarDigital Library
- Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu. 2021. MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training. In ACL 2021.Google Scholar
- Kun Zhao, Siqi Li, Juanjuan Cai, Hui Wang, and Jingling Wang. 2019. An emotional symbolic music generation system based on lstm networks. In 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2039--2043.Google Scholar
- Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. Xiaoice band: A melody and arrangement generation framework for pop music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2837--2846. Google ScholarDigital Library
Index Terms
- A Tutorial on AI Music Composition
Recommendations
Automatic music composition using answer set programming
Music composition used to be a pen and paper activity. These days music is often composed with the aid of computer software, even to the point where the computer composes parts of the score autonomously. The composition of most styles of music is ...
MuseFlow: music accompaniment generation based on flow
AbstractArranging and orchestration are critical aspects of music composition and production. Traditional accompaniment arranging is time-consuming and requires expertise in music theory. In this work, we utilize a deep learning model, the flow model, to ...
Music Composition Patterns: A Pattern Language for Touching Music
EuroPLop '22: Proceedings of the 27th European Conference on Pattern Languages of ProgramsThis paper introduces some patterns of Music Composition Patterns: A Pattern Language for Touching Music. This pattern language is the verbalized and systematized rules of music design hidden in the music of Takashi Watanabe, a famous Japanese film ...
Comments