Loading [MathJax]/extensions/MathMenu.js
Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition | IEEE Conference Publication | IEEE Xplore

Exploring Effective Data Augmentation with TDNN-LSTM Neural Network Embedding for Speaker Recognition


Abstract:

The speaker characterization using four different data augmentation methods and time delay neural networks and long short-term memory neural networks (TDNN-LSTM) is propo...Show More

Abstract:

The speaker characterization using four different data augmentation methods and time delay neural networks and long short-term memory neural networks (TDNN-LSTM) is proposed in this paper. The proposed data augmentation is used to increase the amount and diversity of the training data including adding speed perturbation, adding volume perturbation, adding room impulse responses, and adding additive noises. The idea of TDNN-LSTM based speaker embedding is better to capture the temporal information in speaker speech than the conventional TDNN based x-vectors. The proposed methods were trained on VoxCeleb dataset and tested with Speakers In The Wild (SITW) dataset in the evaluation core-core condition. We achieved results of EER=1.86% and a minimum decision cost function (DCF) of 0.204 at p-target=0.01, and a minimum DCF of 0.368 at p-target=0.001. The proposed methods outperform the baselines of both i-vector and x-vector.
Date of Conference: 14-18 December 2019
Date Added to IEEE Xplore: 20 February 2020
ISBN Information:
Conference Location: Singapore

Contact IEEE to Subscribe

References

References is not available for this document.