abstract

A Tutorial on AI Music Composition

Authors:
Xu Tan

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Xiaobing Li

Central Conservatory of Music, Beijing, China

Central Conservatory of Music, Beijing, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 5678–5680https://doi.org/10.1145/3474085.3478875

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5678–5680

ABSTRACT

AI music composition is one of the most attractive and important topics in artificial intelligence, music, and multimedia. The typical tasks in AI music composition include melody generation, song writing, accompaniment generation, arrangement, performance generation, timbre rendering, sound generation, and singing voice synthesis, which cover different modalities (e.g., symbolic music score, sound) and well match to the theme of ACM Multimedia. As the rapid development of artificial intelligence techniques such as content creation and deep learning, AI based music composition has achieved rapid progress, but still encountered a lot of challenges. A thorough introduction and review on the basics, the research progress, as well as how to address the challenges in AI music composition are timely and necessary for a broad audience working on artificial intelligence, music, and multimedia. In this tutorial, we will first introduce the background of AI music composition, including music basics and deep learning techniques for music composition. Then we will introduce AI music composition from two perspectives: 1) key components, which include music score generation, music performance generation, and music sound generation; 2) advanced topics, which include music structure/form/style/emotion modeling, timbre synthesis/transfer/mixing, etc. At last, we will point out some research challenges and future directions in AI music composition. This tutorial can serve both academic researchers and industry practitioners working on AI music composition.

References

Richard Ashley. 2017. Musical Structure: Form. In The Routledge Companion to Music Cognition. Routledge, 179--190.Google ScholarCross Ref
Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, and Ming Zhou. 2019. Neural melody composition from lyrics. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 499--511.Google ScholarDigital Library
Jean-Pierre Briot, Gaëtan Hadjeres, and Francc ois-David Pachet. 2017. Deep learning techniques for music generation--a survey. arXiv preprint arXiv:1709.01620 (2017).Google Scholar
Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, and Tie-Yan Liu. 2020. HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776 (2020).Google Scholar
Shuqi Dai, Zheng Zhang, and Gus G Xia. 2018. Music style transfer: A position paper. arXiv preprint arXiv:1803.06841 (2018).Google Scholar
Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, and Francis Bach. 2018. Sing: Symbol-to-instrument neural generator. arXiv preprint arXiv:1810.09785 (2018). Google ScholarDigital Library
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. 2018. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Jesse Engel, Lamtharn (Hanoi) Hantrakul, Chenjie Gu, and Adam Roberts. 2020. DDSP: Differentiable Digital Signal Processing. In International Conference on Learning Representations. https://openreview.net/forum?id=B1x1ma4tDrGoogle Scholar
Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, and Karen Simonyan. 2017. Neural audio synthesis of musical notes with wavenet autoencoders. In International Conference on Machine Learning. PMLR, 1068--1077. Google ScholarDigital Library
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
Daniel Taesoo Lee Harrison Gill and Nick Marwell. [n.d.]. Deep Learning in Musical Lyric Generation: An LSTM-Based Approach. ([n.,d.]).Google Scholar
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018b. Music Transformer: Generating Music with Long-Term Structure. In International Conference on Learning Representations.Google Scholar
Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, and Roger B Grosse. 2018a. Timbretron: A wavenet (cyclegan (cqt (audio))) pipeline for musical timbre transfer. arXiv preprint arXiv:1811.09620 (2018).Google Scholar
Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia. 1180--1188. Google ScholarDigital Library
Yun-Ning Hung, I Chiang, Yi-An Chen, Yi-Hsuan Yang, et almbox. 2019. Musical composition style transfer via disentangled timbre representations. arXiv preprint arXiv:1905.13567 (2019). Google ScholarDigital Library
Dasaem Jeong, Taegyun Kwon, Yoojin Kim, and Juhan Nam. 2019. Score and performance features for rendering expressive music performances. In Proc. of Music Encoding Conf.Google Scholar
Shulei Ji, Jing Luo, and Xinyu Yang. 2020. A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions. arXiv preprint arXiv:2011.06801 (2020).Google Scholar
Hsin-Pei Lee, Jhih-Sheng Fang, and Wei-Yun Ma. 2019 b. icomposer: An automatic songwriting system for chinese popular music. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 84--88.Google Scholar
Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, and Kyogu Lee. 2019 a. Adversarially Trained End-to-End Korean Singing Voice Synthesis System. Proc. Interspeech 2019 (2019), 2588--2592.Google Scholar
Xia Liang, Junmin Wu, and Jing Cao. 2019. MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation. arXiv preprint arXiv:1909.03522 (2019).Google Scholar
Peiling Lu, Jie Wu, Jian Luan, Xu Tan, and Li Zhou. 2020. XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System. Proc. Interspeech 2020 (2020), 1306--1310.Google Scholar
Xu Lu, Jie Wang, Bojin Zhuang, Shaojun Wang, and Jing Xiao. 2019. A syllable-structured, contextually-based conditionally generation of chinese lyrics. In Pacific Rim International Conference on Artificial Intelligence. Springer, 257--265.Google ScholarCross Ref
Peter Manning. 2013. Electronic and computer music .Oxford University Press. Google ScholarDigital Library
Eita Nakamura, Yasuyuki Saito, and Kazuyoshi Yoshii. 2020. Statistical learning and estimation of piano fingering. Information Sciences, Vol. 517 (2020), 68--85.Google ScholarDigital Library
Sageev Oore, Ian Simon, Sander Dieleman, and Doug Eck. 2017. Learning to create piano performances. (2017).Google Scholar
Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2020 a. Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM International Conference on Multimedia. 1198--1206. Google ScholarDigital Library
Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, and Tie-Yan Liu. 2020 b. Deepsinger: Singing voice synthesis with data mined from the web. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1979--1989.Google ScholarDigital Library
Adam Roberts, Jesse Engel, Colin Raffel, Ian Simon, and Curtis Hawthorne. 2018. MusicVAE: Creating a palette for musical scores with machine learning, March 2018.Google Scholar
Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, and Tao Qin. 2020. SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint. arXiv preprint arXiv:2012.05168 (2020).Google Scholar
Kento Watanabe, Yuichiroh Matsubayashi, Satoru Fukayama, Masataka Goto, Kentaro Inui, and Tomoyasu Nakano. 2018. A melody-conditioned lyrics language model. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 163--172.Google ScholarCross Ref
Jian Wu, Xiaoguang Liu, Xiaolin Hu, and Jun Zhu. 2020. PopMNet: Generating structured pop music melodies using neural networks. Artificial Intelligence, Vol. 286 (2020), 103303.Google ScholarCross Ref
Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, and Tie-Yan Liu. 2021. DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling. In ACL 2021 .Google Scholar
Yi Yu, Abhishek Srivastava, and Simon Canales. 2021. Conditional lstm-gan for melody generation from lyrics. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 17, 1 (2021), 1--20. Google ScholarDigital Library
Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu. 2021. MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training. In ACL 2021.Google Scholar
Kun Zhao, Siqi Li, Juanjuan Cai, Hui Wang, and Jingling Wang. 2019. An emotional symbolic music generation system based on lstm networks. In 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). IEEE, 2039--2043.Google Scholar
Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Chuan Qin, Jiawei Li, Kun Zhang, Guang Zhou, Furu Wei, Yuanchun Xu, and Enhong Chen. 2018. Xiaoice band: A melody and arrangement generation framework for pop music. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2837--2846. Google ScholarDigital Library

Index Terms

A Tutorial on AI Music Composition
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Automatic music composition using answer set programming

Music composition used to be a pen and paper activity. These days music is often composed with the aid of computer software, even to the point where the computer composes parts of the score autonomously. The composition of most styles of music is ...
Read More
MuseFlow: music accompaniment generation based on flow
Abstract
Arranging and orchestration are critical aspects of music composition and production. Traditional accompaniment arranging is time-consuming and requires expertise in music theory. In this work, we utilize a deep learning model, the flow model, to ...
Read More
Music Composition Patterns: A Pattern Language for Touching Music
EuroPLop '22: Proceedings of the 27th European Conference on Pattern Languages of Programs

This paper introduces some patterns of Music Composition Patterns: A Pattern Language for Touching Music. This pattern language is the verbalized and systematized rules of music design hidden in the music of Takashi Watanabe, a famous Japanese film ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Check for updates
Author Tags
accompaniment generation
ai music
melody generation
music arrangement
music composition
singing voice synthesis
song writing
Qualifiers
- abstract
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,255
  Total Downloads
- Downloads (Last 12 months)432
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Tutorial on AI Music Composition

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic music composition using answer set programming

MuseFlow: music accompaniment generation based on flow

Music Composition Patterns: A Pattern Language for Touching Music