The HCCL System for the NIST SRE21

Li, Zhuo; Xiao, Runqiu; Chen, Hangting; Zhao, Zhenduo; Zhang, Zihan; Wang, Wenchao

doi:10.21437/Interspeech.2022-10342

The HCCL System for the NIST SRE21

Zhuo Li, Runqiu Xiao, Hangting Chen, Zhenduo Zhao, Zihan Zhang, Wenchao Wang

This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21). We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce several techniques to reduce the cross-domain mismatch. Specifically, Codec and speech enhancement are directly applied to the raw speech to eliminate the codecs and the environment noise mismatch. We denote these methods that work directly on raw audio to eliminate the relatively explicit mismatch collectively as data adaptation methods. Experiments show that data adaption methods achieve 15\% improvements over our baseline. Furthermore, some popular back-ends domain adaptation algorithms are deployed on speaker embeddings to alleviate speaker performance degradation caused by the implicit mismatch. Score calibration is a major failure for us in SRE21. The reason is that score calibration with excessive parameters easily leads to overfitting.

doi: 10.21437/Interspeech.2022-10342

Cite as: Li, Z., Xiao, R., Chen, H., Zhao, Z., Zhang, Z., Wang, W. (2022) The HCCL System for the NIST SRE21. Proc. Interspeech 2022, 3709-3713, doi: 10.21437/Interspeech.2022-10342

@inproceedings{li22s_interspeech,
  author={Zhuo Li and Runqiu Xiao and Hangting Chen and Zhenduo Zhao and Zihan Zhang and Wenchao Wang},
  title={{The HCCL System for the NIST SRE21}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={3709--3713},
  doi={10.21437/Interspeech.2022-10342}
}