ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers

Marvin Borsdorf, Chenglin Xu, Haizhou Li, Tanja Schultz

Speaker extraction has been studied mostly for the scenarios where a target speaker is present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday conversations. For example, a target speaker can be the only active talker, be quiet for a while, or leave the conversation, that means the target speaker is absent from the mixture. Traditional speaker extraction models fail in these scenarios. We propose a novel speaker extraction approach to handle speech mixtures with one or two talkers in which the target speaker can either be present or absent. First, we formulate four speaker extraction conditions to cover the typical scenarios of everyday conversations with one and two talkers. Second, we introduce a joint training scheme with one unified loss function that works for all four conditions. We show that only a small amount of data is required to adapt the model to work well in the four conditions.


doi: 10.21437/Interspeech.2021-1939

Cite as: Borsdorf, M., Xu, C., Li, H., Schultz, T. (2021) Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers. Proc. Interspeech 2021, 1469-1473, doi: 10.21437/Interspeech.2021-1939

@inproceedings{borsdorf21_interspeech,
  author={Marvin Borsdorf and Chenglin Xu and Haizhou Li and Tanja Schultz},
  title={{Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1469--1473},
  doi={10.21437/Interspeech.2021-1939}
}