Residual Time-restricted Self-Attentive TDNN Speaker Embedding for Noisy and Far-field Conditions | IEEE Conference Publication | IEEE Xplore

Residual Time-restricted Self-Attentive TDNN Speaker Embedding for Noisy and Far-field Conditions


Abstract:

One of the emerging challenges in automatic speaker recognition is the development of systems that are robust to noisy and far-field conditions. The current standard for ...Show More

Abstract:

One of the emerging challenges in automatic speaker recognition is the development of systems that are robust to noisy and far-field conditions. The current standard for x-vector speaker embedding is based on a time-delay neural network (TDNN) and is less robust than systems based on a residual network (ResNet) and other baseline systems that use signal enhancement preprocessing in presence of these conditions. In this study, we improve the performance of TDNN-based embedding by integrating a residual block with a time-restricted self-attention option (AttResBlock) into the TDNN frame level. Experiments using the Voices Obscured in Complex Environmental Settings (VOiCES) corpus are carried out to evaluate the proposed speaker embedding extractor (AttResBlock-TDNN). The experimental results show that AttResBlock-TDNN outperforms state-of-the-art systems under many adverse conditions. For instance, the proposed AttResBlock-TDNN produces relative improvements in the minDCF and EER of 11.4% and 15.5%, respectively, over the original TDNN-based encoder.
Date of Conference: 18-20 September 2022
Date Added to IEEE Xplore: 17 October 2022
ISBN Information:

ISSN Information:

Conference Location: Halifax, NS, Canada

References

References is not available for this document.