Loading [a11y]/accessibility-menu.js
Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification | IEEE Conference Publication | IEEE Xplore

Statistics Pooling Time Delay Neural Network Based on X-Vector for Speaker Verification


Abstract:

This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics p...Show More

Abstract:

This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics pooling time delay neural network (TDNN), in which the TDNN structure integrates statistics pooling for each layer, to consider the variation of temporal context in frame-level transformation. The proposed feature vector, named as statsvector, are compared with the baseline x-vector features on the VoxCeleb dataset and the Speakers in the Wild (SITW) dataset for speaker verification. The experimental results showed that the proposed stats-vector with score fusion achieved the best performance on VoxCeleb1 dataset. Furthermore, considering the interference from other speakers in the recordings, we found that the proposed statsvector efficiently reduced the interference and improved the speaker verification performance on the SITW dataset.
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

Contact IEEE to Subscribe

References

References is not available for this document.