Carnatic Singing Voice Separation Using Cold Diffusion on Training Data With Bleeding

doi:10.5281/zenodo.10265347

Published November 4, 2023 | Version v1

Conference paper Open

Carnatic Singing Voice Separation Using Cold Diffusion on Training Data With Bleeding

Supervised music source separation systems using deep learning are trained by minimizing a loss function between pairs of predicted separations and ground-truth isolated sources. However, open datasets comprising isolated sources are few, small, and restricted to a few music styles. At the same time, multi-track datasets with source bleeding are usually found larger in size, and are easier to compile. In this work, we address the task of singing voice separation when the ground-truth signals have bleeding and only the target vocals and the corresponding mixture are available. We train a cold diffusion model on the frequency domain to iteratively transform a mixture into the corresponding vocals with bleeding. Next, we build the final separation masks by clustering spectrogram bins according to their evolution along the transformation steps. We test our approach on a Carnatic music scenario for which solely datasets with bleeding exist, while current research on this repertoire commonly uses source separation models trained solely with Western commercial music. Our evaluation on a Carnatic test set shows that our system improves Spleeter on interference removal and it is competitive in terms of signal distortion. Code is open sourced

Files

000065.pdf

Files (460.9 kB)

Name	Size	Download all
000065.pdf md5:1e9a06bb062872c77845a9738afc545f	460.9 kB	Preview Download

Views

Downloads

Show more details

	All versions	This version
Views	54	54
Downloads	41	41
Data volume	20.3 MB	20.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

ISMIR

Imprint

Proceedings of the 24th International Society for Music Information Retrieval Conference, 553-560. Milan, Italy.

Conference

International Society for Music Information Retrieval Conference (ISMIR 2023) , Milan, Italy, November 5-9, 2023

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: December 5, 2023
Modified: December 5, 2023

Carnatic Singing Voice Separation Using Cold Diffusion on Training Data With Bleeding

Creators

Description

Files

000065.pdf

Files (460.9 kB)