Abstract:
In self-supervised speaker verification, the quality of generated pseudo labels becomes a bottleneck for the performance. This work introduces a dynamic threshold within ...Show MoreMetadata
Abstract:
In self-supervised speaker verification, the quality of generated pseudo labels becomes a bottleneck for the performance. This work introduces a dynamic threshold within the iterative DIstillation with NO labels (DINO) framework. We employ a Gaussian Mixture Model (GMM) to model the loss distribution of the training data. The GMM has two components: one represents samples with reliable labels, and the other with un-reliable ones. These components help us determine a thresh-old for retaining samples with reliable labels. Furthermore, to take advantage of the different sensitivity of network layers to label noise, we further introduce hierarchical training to reduce the negative impact of unreliable labels. Compared to the baseline with a fixed threshold, our two strategies result in an 8.9% relative improvement on the Vox-O trial of the Voxceleb1 evaluation dataset.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: