Abstract:
This paper addresses the problem of multichannel source separation in ambisonics signals combining two powerful approaches, multichannel NMF with recent single-channel de...Show MoreMetadata
Abstract:
This paper addresses the problem of multichannel source separation in ambisonics signals combining two powerful approaches, multichannel NMF with recent single-channel deep learning (DL) based spectrum inference. Individual source spectra are estimated from the zero-order (omnidirectional) spherical harmonic (SH) signal with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The initialized source spectrograms are used within a SHdomain spatial covariance mixing model based on multichannel non-negative matrix factorization (MNMF) that predicts the spatial characteristics of each source in order to refine the network prediction. The proposed framework is evaluated on the task of singing voice separation with a large dataset of simulated ambisonics signals using several setups. Experimental results show that our joint DL+SH-MNMF method outperforms both the individual monophonic DL-based separation, baseline multichannel MNMF and SH domain beamforming classical methods.
Date of Conference: 06-08 October 2021
Date Added to IEEE Xplore: 16 March 2022
ISBN Information: