In this paper, a new way of using phonetic DNN in text-independent speaker recognition is examined. Inspired by the Subspace GMM approach to speech recognition, we try to extract i-vectors that are invariant to the phonetic content for the utterance. We overcome the assumption of gaussian distributed senones by combining DNN with UBM posteriors and we form a complete EM algorithm for training and extracting phonetic content compensated i-vectors. A simplified version of the model is also presented, where the phonetic content and speaker subspaces are learned in a decoupled way. Covariance adaptation is also examined, where the covariance matrices are reestimated rather than copied from the UBM. A set of primary experimental results is reported on NIST-SRE 2010, with modest improvement when fused with the standard i-vectors.
Cite as: Stafylakis, T., Kenny, P., Gupta, V., Alam, J., Kockmann, M. (2016) Compensation for phonetic nuisance variability in speaker recognition using DNNs. Proc. The Speaker and Language Recognition Workshop (Odyssey 2016), 340-345, doi: 10.21437/Odyssey.2016-49
@inproceedings{stafylakis16_odyssey, author={Themos Stafylakis and Patrick Kenny and Vishwa Gupta and Jahangir Alam and Marcel Kockmann}, title={{Compensation for phonetic nuisance variability in speaker recognition using DNNs}}, year=2016, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2016)}, pages={340--345}, doi={10.21437/Odyssey.2016-49} }