Automatic Error Correction for Speaker Embedding Learning with Noisy Labels

Tong, Fuchuan; Liu, Yan; Li, Song; Wang, Jie; Li, Lin; Hong, Qingyang

doi:10.21437/Interspeech.2021-2021

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels

Fuchuan Tong, Yan Liu, Song Li, Jie Wang, Lin Li, Qingyang Hong

Despite the superior performance deep neural networks have achieved in speaker verification tasks, much of their success benefits from the availability of large-scale and carefully labeled datasets. However, noisy labels often occur during data collection. In this paper, we propose an automatic error correction method for deep speaker embedding learning with noisy labels. Specifically, a label noise correction loss is proposed that leverages a model’s generalization capability to correct noisy labels during training. In addition, we improve the vanilla AM-Softmax to estimate a more robust speaker posterior by introducing sub-centers. When applied on the VoxCeleb dataset, the proposed method performs gracefully when noisy labels are introduced. Moreover, when combining with the Bayesian estimation of PLDA with noisy training labels at the back-end, the whole system performs better under conditions in which noisy labels are present.

doi: 10.21437/Interspeech.2021-2021

Cite as: Tong, F., Liu, Y., Li, S., Wang, J., Li, L., Hong, Q. (2021) Automatic Error Correction for Speaker Embedding Learning with Noisy Labels. Proc. Interspeech 2021, 4628-4632, doi: 10.21437/Interspeech.2021-2021

@inproceedings{tong21_interspeech,
  author={Fuchuan Tong and Yan Liu and Song Li and Jie Wang and Lin Li and Qingyang Hong},
  title={{Automatic Error Correction for Speaker Embedding Learning with Noisy Labels}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4628--4632},
  doi={10.21437/Interspeech.2021-2021}
}