Published November 4, 2023 | Version v1
Conference paper Open

Data Collection in Music Generation Training Sets: A Critical Analysis

Description

The practices of data collection in training sets for Automatic Music Generation (AMG) tasks are opaque and overlooked. In this paper, we aimed to identify these practices and surface the values they embed. We systematically identified all datasets used to train AMG models presented at the last ten editions of ISMIR. For each dataset, we checked how it was populated and the extent to which musicians wittingly contributed to its creation.\ Almost half of the datasets (42.6%) were indiscriminately populated by accumulating music data available online without seeking any sort of permission. We discuss the ideologies that underlie this practice and propose a number of suggestions AMG dataset creators might follow. Overall, this paper contributes to the emerging self-critical corpus of work of the ISMIR community, reflecting on the ethical considerations and the social responsibility of our work.

Files

000003.pdf

Files (169.7 kB)

Name Size Download all
md5:fd4d6d2e28f66f6d170a5da81a78d4f6
169.7 kB Preview Download