Abstract
This paper studies the performance degradation of Gaussian probabilistic linear discriminant analysis (GPLDA) speaker verification system, when only short-utterance data is used for speaker verification system development. Subsequently, a number of techniques, including utterance partitioning and source-normalised weighted linear discriminant analysis (SN-WLDA) projections are introduced to improve the speaker verification performance in such conditions. Experimental studies have found that when short utterance data is available for speaker verification development, GPLDA system overall achieves best performance with a lower number of universal background model (UBM) components. As a lower number of UBM components significantly reduces the computational complexity of speaker verification system, that is a useful observation. In limited session data conditions, we propose a simple utterance-partitioning technique, which when applied to the LDA-projected GPLDA system shows over 8% relative improvement on EER values over baseline system on NIST 2008 truncated 10–10 s conditions. We conjecture that this improvement arises from the apparent increase in the number of sessions arising from our partitioning technique and this helps to better model the GPLDA parameters. Further, partitioning SN-WLDA-projected GPLDA shows over 16% and 6% relative improvement on EER values over LDA-projected GPLDA systems respectively on NIST 2008 truncated 10–10 s interview-interview, and NIST 2010 truncated 10–10 s interview-interview and telephone-telephone conditions.
Similar content being viewed by others
Notes
Throughout this paper, we will refer to subspace transformations using a functional notation B[A] meaning that the input is first transformed by technique A, then technique B.
References
Dehak, N., Dehak, R., Glass, J., Reynolds, D., & Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. In Odyssey speaker and language recognition workshop.
Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In Proceedings of interspeech (pp. 1559–1562).
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language. doi:10.1109/TASL.2010.2064307.
Garcia-Romero, D. & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. In International conference on speech communication and technology (pp. 249–252).
Kanagasundaram, A., Dean, D., Gonzalez-Dominguez, J., Sridharan, S., Ramos, D., & Gonzalez-Rodriguez, J. (2013). Improving the PLDA based speaker verification in limited microphone data conditions. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA).
Kanagasundaram, A., Dean, D., & Sridharan, S. (2014). Improving PLDA speaker verification with limited development data. In IEEE international conference on acoustics, speech and signal processing.
Kanagasundaram, A., Dean, D., Sridharan, S., McLaren, M., & Vogt, R. (2013). I-vector based speaker recognition using advanced channel compensation techniques. In computer speech and language.
Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker verication with weighted LDA techniques. In Proceedings of Odyssey workshop.
Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker recognition with weighted LDA techniques. In Proceedings of Odyssey workshop.
Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Ramos, D., & Gonzalez-Rodriguez, J. (2014). Improving short utterance i-vector speaker recognition using utterance variance modelling and compensation techniques. In Speech communication, publication of the European Association for Signal Processing (EURASIP).
Kanagasundaram, A., Dean, D., Vogt, R., McLaren, M., Sridharan, S., & Mason, M. (2012). Weighted LDA techniques for i-vector based speaker verification. In IEEE international conference on acoustics, speech and signal processing (pp. 4781–4784).
Kanagasundaram, A., Vogt, R., Dean, B., Sridharan, S., & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA) (pp. 2341–2344).
Kenny, P. (2010). Bayesian speaker verification with heavy tailed priors. In Proceedings of Odyssey speaker and language recogntion workshop. Brno, Czech Republic.
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M., & Dumouchel, P. (2013). PLDA for speaker verification with utterances of arbitrary duration. In IEEE international conference on acoustics, speech and signal processing.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Commun, 52(1), 12–40.
Mak, M.-W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for gmm-svm speaker verification. Speech Communication, 53(1), 119–130.
McLaren, M. & van Leeuwen, D. (2011). Improved speaker recognition when using i-vectors from multiple speech sources. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5460–5463).
McLaren, M. & van Leeuwen, D. (2011). Source-normalised and weighted LDA for robust speaker recognition using i-vectors. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5456–5459).
McLaren, M., & van Leeuwen, D. (2012). Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Transactions on Audio, Speech and Language Processing, 20(3), 755–766.
McLaren, M., Vogt, R., Baker, B., & Sridharan, S. (2010). Experiments in SVM-based speaker verification using short utterances. In Proceedings of Odyssey workshop.
NIST. (2008). The NIST year 2008 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2008/.
NIST. (2010). The NIST year 2010 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2010/.
Rao, W., & Mak, M.-W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1012–1022.
Rao, W., & Mak, M.-W. (2014). Construction of discriminative kernels from known and unknown non-targets for plda-svm scoring. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4012–4016). Piscataway: IEEE.
Reynolds, D., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83. doi:10.1109/89.365379.
Shum, S., Dehak, N., Dehak, R., & Glass, J. (2010). Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification. In Proceedings of Odyssey.
Vogt, R., Lustri, C., & Sridharan, S. (2008). Factor analysis modelling for speaker verification with short utterances In Odyssey: The speaker and language recognition workshop.
Vogt, R., Baker, B., & Sridharan, S. (2008). Factor analysis subspace estimation for speaker verification with short utterances. In Interspeech 2008. Brisbane, Australia.
Acknowledgements
This research was funded by the Australian Research Council (ARC) Linkage Grant No: LP130100110.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kanagasundaram, A., Dean, D., Sridharan, S. et al. A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20, 247–259 (2017). https://doi.org/10.1007/s10772-017-9402-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9402-8