Loading [a11y]/accessibility-menu.js
Monaural Speech Separation Using Speaker Embedding From Preliminary Separation | IEEE Journals & Magazine | IEEE Xplore

Monaural Speech Separation Using Speaker Embedding From Preliminary Separation


Abstract:

In speech separation, the identities of the speakers may be an important cue to discriminate speeches in the mixture and separate them better. A few recent researches use...Show More

Abstract:

In speech separation, the identities of the speakers may be an important cue to discriminate speeches in the mixture and separate them better. A few recent researches used the speaker embedding as an additional information, but they often require prior information about the target speaker or used noisy speaker embedding extracted from the mixture signal. In this article, we propose monaural speech separation that utilizes the speaker embedding in the later separator blocks, which is extracted from the intermediate separated results obtained by the early stages of the separator network. The later blocks in the separator networks consisting of repeated blocks such as the fully-convolutional time-domain audio separation network (Conv-TasNet) or the successive downsampling and resampling of multi-resolution features (SuDoRM-RF) are modified to take the speaker information as a form of affine transformation or addition to the original input tensor. The experimental results showed that the proposed methods significantly improved the performances of existing separation systems with a moderate number of additional parameters.
Page(s): 2753 - 2763
Date of Publication: 04 August 2021

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.