Skip to main content
Log in

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper studies the performance degradation of Gaussian probabilistic linear discriminant analysis (GPLDA) speaker verification system, when only short-utterance data is used for speaker verification system development. Subsequently, a number of techniques, including utterance partitioning and source-normalised weighted linear discriminant analysis (SN-WLDA) projections are introduced to improve the speaker verification performance in such conditions. Experimental studies have found that when short utterance data is available for speaker verification development, GPLDA system overall achieves best performance with a lower number of universal background model (UBM) components. As a lower number of UBM components significantly reduces the computational complexity of speaker verification system, that is a useful observation. In limited session data conditions, we propose a simple utterance-partitioning technique, which when applied to the LDA-projected GPLDA system shows over 8% relative improvement on EER values over baseline system on NIST 2008 truncated 10–10 s conditions. We conjecture that this improvement arises from the apparent increase in the number of sessions arising from our partitioning technique and this helps to better model the GPLDA parameters. Further, partitioning SN-WLDA-projected GPLDA shows over 16% and 6% relative improvement on EER values over LDA-projected GPLDA systems respectively on NIST 2008 truncated 10–10 s interview-interview, and NIST 2010 truncated 10–10 s interview-interview and telephone-telephone conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Throughout this paper, we will refer to subspace transformations using a functional notation B[A] meaning that the input is first transformed by technique A, then technique B.

References

  • Dehak, N., Dehak, R., Glass, J., Reynolds, D., & Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. In Odyssey speaker and language recognition workshop.

  • Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In Proceedings of interspeech (pp. 1559–1562).

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language. doi:10.1109/TASL.2010.2064307.

  • Garcia-Romero, D. & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. In International conference on speech communication and technology (pp. 249–252).

  • Kanagasundaram, A., Dean, D., Gonzalez-Dominguez, J., Sridharan, S., Ramos, D., & Gonzalez-Rodriguez, J. (2013). Improving the PLDA based speaker verification in limited microphone data conditions. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA).

  • Kanagasundaram, A., Dean, D., & Sridharan, S. (2014). Improving PLDA speaker verification with limited development data. In IEEE international conference on acoustics, speech and signal processing.

  • Kanagasundaram, A., Dean, D., Sridharan, S., McLaren, M., & Vogt, R. (2013). I-vector based speaker recognition using advanced channel compensation techniques. In computer speech and language.

  • Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker verication with weighted LDA techniques. In Proceedings of Odyssey workshop.

  • Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker recognition with weighted LDA techniques. In Proceedings of Odyssey workshop.

  • Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Ramos, D., & Gonzalez-Rodriguez, J. (2014). Improving short utterance i-vector speaker recognition using utterance variance modelling and compensation techniques. In Speech communication, publication of the European Association for Signal Processing (EURASIP).

  • Kanagasundaram, A., Dean, D., Vogt, R., McLaren, M., Sridharan, S., & Mason, M. (2012). Weighted LDA techniques for i-vector based speaker verification. In IEEE international conference on acoustics, speech and signal processing (pp. 4781–4784).

  • Kanagasundaram, A., Vogt, R., Dean, B., Sridharan, S., & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA) (pp. 2341–2344).

  • Kenny, P. (2010). Bayesian speaker verification with heavy tailed priors. In Proceedings of Odyssey speaker and language recogntion workshop. Brno, Czech Republic.

  • Kenny, P., Stafylakis, T., Ouellet, P., Alam, M., & Dumouchel, P. (2013). PLDA for speaker verification with utterances of arbitrary duration. In IEEE international conference on acoustics, speech and signal processing.

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Commun, 52(1), 12–40.

    Article  Google Scholar 

  • Mak, M.-W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for gmm-svm speaker verification. Speech Communication, 53(1), 119–130.

    Article  Google Scholar 

  • McLaren, M. & van Leeuwen, D. (2011). Improved speaker recognition when using i-vectors from multiple speech sources. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5460–5463).

  • McLaren, M. & van Leeuwen, D. (2011). Source-normalised and weighted LDA for robust speaker recognition using i-vectors. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5456–5459).

  • McLaren, M., & van Leeuwen, D. (2012). Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Transactions on Audio, Speech and Language Processing, 20(3), 755–766.

    Article  Google Scholar 

  • McLaren, M., Vogt, R., Baker, B., & Sridharan, S. (2010). Experiments in SVM-based speaker verification using short utterances. In Proceedings of Odyssey workshop.

  • NIST. (2008). The NIST year 2008 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2008/.

  • NIST. (2010). The NIST year 2010 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2010/.

  • Rao, W., & Mak, M.-W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1012–1022.

    Article  Google Scholar 

  • Rao, W., & Mak, M.-W. (2014). Construction of discriminative kernels from known and unknown non-targets for plda-svm scoring. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4012–4016). Piscataway: IEEE.

  • Reynolds, D., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83. doi:10.1109/89.365379.

    Article  Google Scholar 

  • Shum, S., Dehak, N., Dehak, R., & Glass, J. (2010). Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification. In Proceedings of Odyssey.

  • Vogt, R., Lustri, C., & Sridharan, S. (2008). Factor analysis modelling for speaker verification with short utterances In Odyssey: The speaker and language recognition workshop.

  • Vogt, R., Baker, B., & Sridharan, S. (2008). Factor analysis subspace estimation for speaker verification with short utterances. In Interspeech 2008. Brisbane, Australia.

Download references

Acknowledgements

This research was funded by the Australian Research Council (ARC) Linkage Grant No: LP130100110.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahilan Kanagasundaram.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanagasundaram, A., Dean, D., Sridharan, S. et al. A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20, 247–259 (2017). https://doi.org/10.1007/s10772-017-9402-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9402-8

Keywords

Navigation