Skip to main content
Log in

Exploration of Local Variability in Text-Independent Speaker Verification

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Total variability model has shown to be effective for text-independent speaker verification. It provisions a tractable way to estimate the so-called i-vector, which describes the speaker and session variability rendered in a whole utterance. In order to extract the local session variability that is neglected by an i-vector, local variability models were proposed, including the Gaussian- and the dimension-oriented local variability models. This paper presents a consolidated study of the total and local variability models and gives a full comparison between them under the same framework. Besides, new extensions are proposed for the existing local variability models. The comparison between the total variability model and the local variability models is fulfilled with the experiments on NIST SRE’08 and SRE’10 datasets. Furthermore, in the experiments, the dimension-oriented local variability models show their capability to capture the session variability which is complementary to that estimated by the total variability model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  1. Reynolds, D.A., Quatieri, T.F., & Dumn, R.B. (2000). Speaker verification using adapted Gaussian mixture model. Digital Signal Processing, 10(1–3), 19–41.

    Article  Google Scholar 

  2. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  3. Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Speaker and session variability in GMM-Based speaker verification. IEEE Trans. Audio Speech and Language Processing, 15(4), 1448–1460.

    Article  Google Scholar 

  4. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech and Language Processing, 19(4), 788–798.

    Article  Google Scholar 

  5. Bishop, C.M. (2006). Pattern recognition and machine learning: Springer.

  6. Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., & Dumouchel, P. (2013). PLDA for speaker verification with utterance of arbitrary duration. In: Proceedings of IEEE ICASSP, (pp. 7649–7653).

  7. Hatch, A., Kajarekar, S., & Stolcke, A. (2006). Within-class covariance normalization for SVM-based speaker recognition. In: International conference on spoken language processing, Pittsburgh.

  8. Prince, S.J.D., & Elder, J.H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of the international conference on computer vision.

  9. Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability modeling for text-independent speaker verification. In: Proceedings of Odyssey: Speaker and Language Recognition Workshop.

  10. Chen, L., Lee, K.A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Local variability vector for text-independent speaker verification. In: Proceedings of ISCSLP, (pp. 54–58).

  11. Kenny, P. (2012). A small footprint i-vector extractor. In: Proceedings of the Odyssey: speaker and language recognition workshop.

  12. Matejka, P., Glembek, O., Castaldo, F., Alam, J., Plchot, O., Kenny, P., Burget, L., & Cernocky, J. (2011). Full-covariance ubm and heavy-tailed plda in i-vector speaker verification. In: Proceedings of the IEEE ICASSP, (pp. 4828–4831).

  13. Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In: Proceedings of the Odyssey: speaker and language recognition workshop.

  14. Prince, S.J.D. (2012). Computer vision: models, learning, and inference, Cambridge University Press.

  15. Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., & Li, H. (2012). PLDA modeling in i-vector and supervector space for speaker verification. In: Proceedings if the INTERSPEECH, paper 198.

  16. Lee, K.A., Larcher, A., You, C.H., Ma, B., & Li, H. (2013). Multi-session PLDA scoring of i-vector for partially open-set speaker detection. In: Proceedings of the INTERSPEECH, (pp. 3651–3655).

  17. Kenny, P., Stafylakis, T., Ouellet, P., Alam, J., & Dumouchel, P. (2013). PLDA for Speaker Verification with Utterances of Arbitrary Duration. In: Proceedings of the IEEE ICASSP, (pp. 7649–7653).

  18. Chen, L., Lee, K. A., Ma, B., Guo, W., Li, H., & Dai, L.R. (2014). Minimum divergence estimation of speaker prior in multi-session PLDA scoring. In: Proceedings of the ICASSP, (pp. 4035–4036).

  19. Brmmer, N., & du Preez, J. (2006). Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2), 230–275.

    Article  Google Scholar 

Download references

Acknowledgments

The work of Liping Chen was partially supported by the National Nature Science Foundation of China (Grant No. 61273264) and the electronic information industry development fund of China (Grant No. 2013-472).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Rong Dai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Lee, K.A., Ma, B. et al. Exploration of Local Variability in Text-Independent Speaker Verification. J Sign Process Syst 82, 217–228 (2016). https://doi.org/10.1007/s11265-015-0997-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-0997-1

Keywords

Navigation