A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Kanagasundaram, Ahilan; Dean, David; Sridharan, Sridha; Ghaemmaghami, Houman; Fookes, Clinton

doi:10.1007/s10772-017-9402-8

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Published: 16 February 2017

Volume 20, pages 247–259, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Ahilan Kanagasundaram ORCID: orcid.org/0000-0002-0533-7986^1,2,
David Dean²,
Sridha Sridharan²,
Houman Ghaemmaghami² &
…
Clinton Fookes²

210 Accesses
9 Citations
Explore all metrics

Abstract

This paper studies the performance degradation of Gaussian probabilistic linear discriminant analysis (GPLDA) speaker verification system, when only short-utterance data is used for speaker verification system development. Subsequently, a number of techniques, including utterance partitioning and source-normalised weighted linear discriminant analysis (SN-WLDA) projections are introduced to improve the speaker verification performance in such conditions. Experimental studies have found that when short utterance data is available for speaker verification development, GPLDA system overall achieves best performance with a lower number of universal background model (UBM) components. As a lower number of UBM components significantly reduces the computational complexity of speaker verification system, that is a useful observation. In limited session data conditions, we propose a simple utterance-partitioning technique, which when applied to the LDA-projected GPLDA system shows over 8% relative improvement on EER values over baseline system on NIST 2008 truncated 10–10 s conditions. We conjecture that this improvement arises from the apparent increase in the number of sessions arising from our partitioning technique and this helps to better model the GPLDA parameters. Further, partitioning SN-WLDA-projected GPLDA shows over 16% and 6% relative improvement on EER values over LDA-projected GPLDA systems respectively on NIST 2008 truncated 10–10 s interview-interview, and NIST 2010 truncated 10–10 s interview-interview and telephone-telephone conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Milestones in speaker recognition

Article Open access 15 February 2024

Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges

Article 11 March 2024

Notes

Throughout this paper, we will refer to subspace transformations using a functional notation B[A] meaning that the input is first transformed by technique A, then technique B.

References

Dehak, N., Dehak, R., Glass, J., Reynolds, D., & Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. In Odyssey speaker and language recognition workshop.
Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., & Dumouchel, P. (2009). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In Proceedings of interspeech (pp. 1559–1562).
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language. doi:10.1109/TASL.2010.2064307.
Garcia-Romero, D. & Espy-Wilson, C. (2011). Analysis of i-vector length normalization in speaker recognition systems. In International conference on speech communication and technology (pp. 249–252).
Kanagasundaram, A., Dean, D., Gonzalez-Dominguez, J., Sridharan, S., Ramos, D., & Gonzalez-Rodriguez, J. (2013). Improving the PLDA based speaker verification in limited microphone data conditions. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA).
Kanagasundaram, A., Dean, D., & Sridharan, S. (2014). Improving PLDA speaker verification with limited development data. In IEEE international conference on acoustics, speech and signal processing.
Kanagasundaram, A., Dean, D., Sridharan, S., McLaren, M., & Vogt, R. (2013). I-vector based speaker recognition using advanced channel compensation techniques. In computer speech and language.
Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker verication with weighted LDA techniques. In Proceedings of Odyssey workshop.
Kanagasundaram, A., Dean, D., Sridharan, S., & Vogt, R. (2012). PLDA based speaker recognition with weighted LDA techniques. In Proceedings of Odyssey workshop.
Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Ramos, D., & Gonzalez-Rodriguez, J. (2014). Improving short utterance i-vector speaker recognition using utterance variance modelling and compensation techniques. In Speech communication, publication of the European Association for Signal Processing (EURASIP).
Kanagasundaram, A., Dean, D., Vogt, R., McLaren, M., Sridharan, S., & Mason, M. (2012). Weighted LDA techniques for i-vector based speaker verification. In IEEE international conference on acoustics, speech and signal processing (pp. 4781–4784).
Kanagasundaram, A., Vogt, R., Dean, B., Sridharan, S., & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Proceedings of INTERSPEECH, International Speech Communication Association (ISCA) (pp. 2341–2344).
Kenny, P. (2010). Bayesian speaker verification with heavy tailed priors. In Proceedings of Odyssey speaker and language recogntion workshop. Brno, Czech Republic.
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M., & Dumouchel, P. (2013). PLDA for speaker verification with utterances of arbitrary duration. In IEEE international conference on acoustics, speech and signal processing.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Commun, 52(1), 12–40.
Article Google Scholar
Mak, M.-W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for gmm-svm speaker verification. Speech Communication, 53(1), 119–130.
Article Google Scholar
McLaren, M. & van Leeuwen, D. (2011). Improved speaker recognition when using i-vectors from multiple speech sources. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5460–5463).
McLaren, M. & van Leeuwen, D. (2011). Source-normalised and weighted LDA for robust speaker recognition using i-vectors. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5456–5459).
McLaren, M., & van Leeuwen, D. (2012). Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Transactions on Audio, Speech and Language Processing, 20(3), 755–766.
Article Google Scholar
McLaren, M., Vogt, R., Baker, B., & Sridharan, S. (2010). Experiments in SVM-based speaker verification using short utterances. In Proceedings of Odyssey workshop.
NIST. (2008). The NIST year 2008 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2008/.
NIST. (2010). The NIST year 2010 speaker recognition evaluation plan, Technical report, NIST. http://www.itl.nist.gov/iad/mig/tests/sre/2010/.
Rao, W., & Mak, M.-W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1012–1022.
Article Google Scholar
Rao, W., & Mak, M.-W. (2014). Construction of discriminative kernels from known and unknown non-targets for plda-svm scoring. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4012–4016). Piscataway: IEEE.
Reynolds, D., & Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Audio, Speech and Language Processing, 3(1), 72–83. doi:10.1109/89.365379.
Article Google Scholar
Shum, S., Dehak, N., Dehak, R., & Glass, J. (2010). Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification. In Proceedings of Odyssey.
Vogt, R., Lustri, C., & Sridharan, S. (2008). Factor analysis modelling for speaker verification with short utterances In Odyssey: The speaker and language recognition workshop.
Vogt, R., Baker, B., & Sridharan, S. (2008). Factor analysis subspace estimation for speaker verification with short utterances. In Interspeech 2008. Brisbane, Australia.

Download references

Acknowledgements

This research was funded by the Australian Research Council (ARC) Linkage Grant No: LP130100110.

Author information

Authors and Affiliations

Department of Electrical & Electronic Engineering, Faculty of Engineering, University of Jaffna, Kilinochchi, Sri Lanka
Ahilan Kanagasundaram
Speech Research Lab, SAIVT, Queensland University of Technology, Brisbane, QLD, Australia
Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Houman Ghaemmaghami & Clinton Fookes

Authors

Ahilan Kanagasundaram
View author publications
You can also search for this author in PubMed Google Scholar
David Dean
View author publications
You can also search for this author in PubMed Google Scholar
Sridha Sridharan
View author publications
You can also search for this author in PubMed Google Scholar
Houman Ghaemmaghami
View author publications
You can also search for this author in PubMed Google Scholar
Clinton Fookes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahilan Kanagasundaram.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanagasundaram, A., Dean, D., Sridharan, S. et al. A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20, 247–259 (2017). https://doi.org/10.1007/s10772-017-9402-8

Download citation

Received: 18 October 2016
Accepted: 25 January 2017
Published: 16 February 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10772-017-9402-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Milestones in speaker recognition

Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Milestones in speaker recognition

Comprehensive literature review on children automatic speech recognition system, acoustic linguistic mismatch approaches and challenges

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation