Published November 4, 2023
| Version v1
Conference paper
Open
Data Collection in Music Generation Training Sets: A Critical Analysis
Creators
Description
The practices of data collection in training sets for Automatic Music Generation (AMG) tasks are opaque and overlooked. In this paper, we aimed to identify these practices and surface the values they embed. We systematically identified all datasets used to train AMG models presented at the last ten editions of ISMIR. For each dataset, we checked how it was populated and the extent to which musicians wittingly contributed to its creation.\ Almost half of the datasets (42.6%) were indiscriminately populated by accumulating music data available online without seeking any sort of permission. We discuss the ideologies that underlie this practice and propose a number of suggestions AMG dataset creators might follow. Overall, this paper contributes to the emerging self-critical corpus of work of the ISMIR community, reflecting on the ethical considerations and the social responsibility of our work.
Files
000003.pdf
Files
(169.7 kB)
Name | Size | Download all |
---|---|---|
md5:fd4d6d2e28f66f6d170a5da81a78d4f6
|
169.7 kB | Preview Download |