Abstract
Total variability model has shown to be effective for text-independent speaker verification. It provisions a tractable way to estimate the so-called i-vector, which describes the speaker and session variability rendered in a whole utterance. In order to extract the local session variability that is neglected by an i-vector, local variability models were proposed, including the Gaussian- and the dimension-oriented local variability models. This paper presents a consolidated study of the total and local variability models and gives a full comparison between them under the same framework. Besides, new extensions are proposed for the existing local variability models. The comparison between the total variability model and the local variability models is fulfilled with the experiments on NIST SRE’08 and SRE’10 datasets. Furthermore, in the experiments, the dimension-oriented local variability models show their capability to capture the session variability which is complementary to that estimated by the total variability model.
Similar content being viewed by others
References
Reynolds, D.A., Quatieri, T.F., & Dumn, R.B. (2000). Speaker verification using adapted Gaussian mixture model. Digital Signal Processing, 10(1–3), 19–41.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40.
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Speaker and session variability in GMM-Based speaker verification. IEEE Trans. Audio Speech and Language Processing, 15(4), 1448–1460.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech and Language Processing, 19(4), 788–798.
Bishop, C.M. (2006). Pattern recognition and machine learning: Springer.
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., & Dumouchel, P. (2013). PLDA for speaker verification with utterance of arbitrary duration. In: Proceedings of IEEE ICASSP, (pp. 7649–7653).
Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition. In: International conference on spoken language processing, Pittsburgh.
Prince, S.J.D., & Elder, J.H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of the international conference on computer vision.
Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability modeling for text-independent speaker verification. In: Proceedings of Odyssey: Speaker and Language Recognition Workshop.
Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability vector for text-independent speaker verification. In: Proceedings of ISCSLP, (pp. 54–58).
Kenny, P. (2012). A small footprint i-vector extractor. In: Proceedings of the Odyssey: speaker and language recognition workshop.
Matejka, P., Glembek, O., Castaldo, F., Alam, J., Plchot, O., Kenny, P., Burget, L., & Cernocky, J. (2011). Full-covariance ubm and heavy-tailed plda in i-vector speaker verification. In: Proceedings of the IEEE ICASSP, (pp. 4828–4831).
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In: Proceedings of the Odyssey: speaker and language recognition workshop.
Prince, S.J.D. (2012). Computer vision: models, learning, and inference, Cambridge University Press.
Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., & Li, H. (2012). PLDA modeling in i-vector and supervector space for speaker verification. In: Proceedings if the INTERSPEECH, paper 198.
Lee, K.A., Larcher, A., You, C.H., Ma, B., & Li, H. (2013). Multi-session PLDA scoring of i-vector for partially open-set speaker detection. In: Proceedings of the INTERSPEECH, (pp. 3651–3655).
Kenny, P., Stafylakis, T., Ouellet, P., Alam, J., & Dumouchel, P. (2013). PLDA for Speaker Verification with Utterances of Arbitrary Duration. In: Proceedings of the IEEE ICASSP, (pp. 7649–7653).
Chen, L., Lee, K. A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Minimum divergence estimation of speaker prior in multi-session PLDA scoring. In: Proceedings of the ICASSP, (pp. 4035–4036).
Brmmer, N., & du Preez, J. (2006). Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2), 230–275.
Acknowledgments
The work of Liping Chen was partially supported by the National Nature Science Foundation of China (Grant No. 61273264) and the electronic information industry development fund of China (Grant No. 2013-472).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L., Lee, K.A., Ma, B. et al. Exploration of Local Variability in Text-Independent Speaker Verification. J Sign Process Syst 82, 217–228 (2016). https://doi.org/10.1007/s11265-015-0997-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-015-0997-1