Data Collection in Music Generation Training Sets: A Critical Analysis

Fabio Morreale; Megha Sharma; I-Chieh Wei

doi:10.5281/zenodo.10265217

Published November 4, 2023 | Version v1

Conference paper Open

Data Collection in Music Generation Training Sets: A Critical Analysis

The practices of data collection in training sets for Automatic Music Generation (AMG) tasks are opaque and overlooked. In this paper, we aimed to identify these practices and surface the values they embed. We systematically identified all datasets used to train AMG models presented at the last ten editions of ISMIR. For each dataset, we checked how it was populated and the extent to which musicians wittingly contributed to its creation.\ Almost half of the datasets (42.6%) were indiscriminately populated by accumulating music data available online without seeking any sort of permission. We discuss the ideologies that underlie this practice and propose a number of suggestions AMG dataset creators might follow. Overall, this paper contributes to the emerging self-critical corpus of work of the ISMIR community, reflecting on the ethical considerations and the social responsibility of our work.

Files

000003.pdf

Files (169.7 kB)

Name	Size	Download all
000003.pdf md5:fd4d6d2e28f66f6d170a5da81a78d4f6	169.7 kB	Preview Download

Citations

Oops! Something went wrong while fetching results.

431

Views

421

Downloads

Show more details

	All versions	This version
Views	431	431
Downloads	421	421
Data volume	82.3 MB	82.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 24th International Society for Music Information Retrieval Conference, 37-46. Milan, Italy.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2023) , Milan, Italy, November 5-9, 2023

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 5, 2023
Modified: July 10, 2024

Data Collection in Music Generation Training Sets: A Critical Analysis

Creators

Description

Files

000003.pdf

Files (169.7 kB)