ISCA Archive Odyssey 2022
ISCA Archive Odyssey 2022

Single-Channel Target Speaker Separation Using Joint Training with Target Speaker's Pitch Information

Jincheng He, Yuanyuan Bao, Na Xu, Hongfeng Li, Shicong Li, Linzhang Wang, Fei Xiang, Ming Li

Despite the great progress achieved in the target speaker separation (TSS) task, we are still trying to find other robust ways for performance improvement which are independent of the model architecture and the training loss. Pitch extraction plays an important role in many applications such as speech enhancement and speech separation. It is also a challenging task when there are multiple speakers in the same utterance. In this paper, we explore if the target speaker pitch extraction is possible and how the extracted target pitch could help to improve the TSS performance. A target pitch extraction model is built and incorporated into different TSS models using two different strategies, namely concatenation and joint training. The experimental results on the LibriSpeech dataset show that both training strategies could bring significant improvements to the TSS task, even the precision of the target pitch extraction module is not high enough.


doi: 10.21437/Odyssey.2022-42

Cite as: He, J., Bao, Y., Xu, N., Li, H., Li, S., Wang, L., Xiang, F., Li, M. (2022) Single-Channel Target Speaker Separation Using Joint Training with Target Speaker's Pitch Information. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 301-305, doi: 10.21437/Odyssey.2022-42

@inproceedings{he22_odyssey,
  author={Jincheng He and Yuanyuan Bao and Na Xu and Hongfeng Li and Shicong Li and Linzhang Wang and Fei Xiang and Ming Li},
  title={{Single-Channel Target Speaker Separation Using Joint Training with Target Speaker's Pitch Information}},
  year=2022,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2022)},
  pages={301--305},
  doi={10.21437/Odyssey.2022-42}
}