short-paper

More Than Simply Masking: Exploring Pre-training Strategies for Symbolic Music Understanding

Authors:

Hongfei LinAuthors Info & Claims

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Pages 540 - 544

https://doi.org/10.1145/3591106.3592237

Published: 12 June 2023 Publication History

Abstract

Pre-trained language models have become the prevailing approach for handling natural language processing tasks in recent years. Given the similarities in sequential features between symbolic music and natural language text, it is fairly logical to adopt pre-training methods to symbolic music data. However, the disparity between music and natural language text makes it difficult to comprehensively model the unique features of music through traditional text-based pre-training strategies alone. To address this challenge, in this paper, we design the quad-attribute masking (QM) strategy and propose the key prediction (KP) task to improve the extraction of generic knowledge from symbolic music. We evaluate the impact of various pre-training strategies on several public symbolic music datasets, and the results of our experiments reveal that the proposed multi-task pre-training model can effectively capture music domain knowledge from symbolic music data and significantly improve performance on downstream tasks.

References

[1]

Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, and Yi-Hsuan Yang. 2021. MidiBERT-Piano: Large-scale pre-training for symbolic music understanding. arXiv preprint arXiv:2107.05223 (2021).

[2]

Michael Scott Cuthbert and Christopher Ariza. 2010. music21: A toolkit for computer-aided musicology and symbolic music data. In Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR. 637–642.

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional Transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.

[4]

Lucas N. Ferreira and Jim Whitehead. 2019. Learning to generate music with sentiment. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR. 384–390.

[5]

Francesco Foscarin, Andrew Mcleod, Philippe Rigaux, Florent Jacquemard, and Masahiko Sakai. 2020. ASAP: A dataset of aligned scores and performances for piano transcription. In Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR. 534–541.

[6]

Wen-Yi Hsiao, Jen-Yu Liu, Yin-Cheng Yeh, and Yi-Hsuan Yang. 2021. Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs. Proceedings of the AAAI Conference on Artificial Intelligence 35, 1 (2021), 178–186.

[7]

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, and Douglas Eck. 2019. Music Transformer: Generating music with long-term structure. In International Conference on Learning Representations.

[8]

Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia. 1180–1188.

Digital Library

[9]

Hsiao-Tzu Hung, Joann Ching, Seungheon Doh, Nabin Kim, Juhan Nam, and Yi-Hsuan Yang. 2021. EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR. 318–325.

[10]

Carol L Krumhansl and Edward J Kessler. 1982. Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys.Psychological Review 89, 4 (1982), 334.

[11]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.

[12]

Wei-Tsung Lu and Li Su. 2018. Deep Learning Models for Melody Perception: An Investigation on Symbolic Music Data. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC. IEEE, Honolulu, 1620–1625.

[13]

Jibao Qiu, C. L. Philip Chen, and Tong Zhang. 2022. A novel multi-task learning method for symbolic music emotion recognition. arXiv preprint arXiv:2201.05782 (2022).

[14]

James A Russell. 1980. A circumplex model of affect.Journal of Personality and Social Psychology 39, 6 (1980), 1161.

[15]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of Advances in Neural Information Processing Systems. 5998–6008.

[16]

Ziyu Wang, Ke Chen, Junyan Jiang, Yiyi Zhang, Maoran Xu, Shuqi Dai, Xianbin Gu, and Gus Xia. 2020. POP909: A pop-song dataset for music arrangement generation. In Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR. 38–45.

[17]

Ziyu Wang and Gus Xia. 2021. MuseBERT: Pre-training music representation for music understanding and controllable generation. In Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR. 722–729.

[18]

Shih-Lun Wu and Yi-Hsuan Yang. 2021. MuseMorphose: Full-song and fine-grained music style transfer with One Transformer VAE. arXiv preprint arXiv:2105.04090 (2021).

[19]

Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu. 2021. MusicBERT: Symbolic music understanding with large-scale pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP. 791–800.

[20]

Hongyuan Zhu, Ye Niu, Di Fu, and Hao Wang. 2021. MusicBERT: A self-supervised learning of music representation. In Proceedings of the 29th ACM International Conference on Multimedia. 3955–3963.

Digital Library

Cited By

Le DBigo LHerremans DKeller M(2025)Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: A SurveyACM Computing Surveys10.1145/3714457Online publication date: 28-Jan-2025
https://doi.org/10.1145/3714457
Qin YXie HDing SLi YTan BYe M(2024)Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-TrainingSensors10.3390/s2415501724:15(5017)Online publication date: 2-Aug-2024
https://doi.org/10.3390/s24155017
Liang XZhao ZZeng WHe YHe FWang YGao C(2024)PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688332(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10688332
Show More Cited By

Index Terms

More Than Simply Masking: Exploring Pre-training Strategies for Symbolic Music Understanding
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Symbolic Music Similarity through a Graph-Based Representation
AM '18: Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion

In this work, a novel representation system for symbolic music is described. The proposed representation system is graph-based and could theoretically represent music both from a horizontal (contrapuntal) and from a vertical (harmonic) point of view, by ...
A notation-based query language for searching in symbolic music
DLfM '19: Proceedings of the 6th International Conference on Digital Libraries for Musicology

Existing systems for searching in symbolic music corpora generally suffer from either of two limitations: they are either limited in power because they accept only simple search patterns, or they are hard for musicologists and musicians to use because ...
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

June 2023

694 pages

ISBN:9798400701788

DOI:10.1145/3591106

Editors:
Ioannis (Yiannis) Kompatsiaris
Centre for Research and Technology Hellas, Greece
,
Jiebo Luo
University of Rochester,USA
,
Nicu Sebe
University of Trento, Italy
,
Angela Yao
National University of Singapore, Singapore
,
Vasileios Mezaris
Centre for Research and Technology Hellas, Greece
,
Symeon Papadopoulos
Centre for Research and Technology Hellas, Greece
,
Adrian Popescu
CEA LIST, France
,
Zi (Helen) Huang
University of Queensland, Australia

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

ICMR '23

Sponsor:

SIGMM

ICMR '23: International Conference on Multimedia Retrieval

June 12 - 15, 2023

Thessaloniki, Greece

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
205
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)8

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Le DBigo LHerremans DKeller M(2025)Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: A SurveyACM Computing Surveys10.1145/3714457Online publication date: 28-Jan-2025
https://doi.org/10.1145/3714457
Qin YXie HDing SLi YTan BYe M(2024)Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-TrainingSensors10.3390/s2415501724:15(5017)Online publication date: 2-Aug-2024
https://doi.org/10.3390/s24155017
Liang XZhao ZZeng WHe YHe FWang YGao C(2024)PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10688332(1-6)Online publication date: 15-Jul-2024
https://doi.org/10.1109/ICME57554.2024.10688332
Zhao JYoshii K(2023)Multimodal Multifaceted Music Emotion Recognition Based on Self-Attentive Fusion of Psychology-Inspired Symbolic and Acoustic Features2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)10.1109/APSIPAASC58517.2023.10317539(1641-1645)Online publication date: 31-Oct-2023
https://doi.org/10.1109/APSIPAASC58517.2023.10317539

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten