On the use of Total Variability and Probabilistic Linear Discriminant Analysis for Speaker Verification on Short Utterances

Domínguez, Javier González; Zazo, Rubén; González-Rodríguez, Joaquin

doi:10.1007/978-3-642-35292-8_2

Javier González Domínguez⁷,
Rubén Zazo⁷ &
Joaquin González-Rodríguez⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

783 Accesses
2 Citations

Abstract

This paper explores the use of state-of-the-art acoustic systems, namely Total Variability and Probabilistic Linear Discriminant Analysis for speaker verification on short utterances. While the recent advances in the field dealing with the session variability problem have proved to greatly outperform speaker verification systems on typical scenarios where a reasonable amount of speech is available, this performance rapidly degrades at the presence of limited data in both enrolment and verification stages. This paper studies the behaviour of TV and PLDA on those scenarios where a scarce amount of speech (~10s) is available to train and testing a speaker identity. The analysis has been carried out on the well defined and standard 10s-10s task belonging to the NIST Speaker Recognition Evaluation 2010 (NIST SRE10) and it explores the multiple parameters, which define TV and PLDA in order to give some insight about their relevance in this specific scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kenny, P., Boulianne, G., Oullet, P., Dumouchel, P.: Speaker and Session Variability in GMM-Based Speaker Verification. IEEE Trans. on Audio, Speech and Language Processing 15(4), 1448–1460 (2007)
Article Google Scholar
Vogt, R., Sridharan, S.: Explicit Modeling of Session Variability for Speaker Verification. Computer Speech & Language 22(1), 17–38 (2008)
Article Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-End Factor Analysis for Speaker Verification. IEEE Transactions on Audio, Speech, and Language Processing 19(4), 788–798 (2011)
Article Google Scholar
Kenny, P.: Bayesian Speaker Verification with Heavy-Tailed Priors. In: Odyssey: The Speaker and Language Recognition Workshop, Brno, Czech Republic, June 28-July 1 (2010)
Google Scholar
Scheffer, N., Ferrer, L., Graciarena, M., Kajarekar, S.S., Shriberg, E., Stolcke, A.: The SRI NIST 2010 Speaker Recognition Evaluation System. In: ICASSP, pp. 5292–5295 (2011)
Google Scholar
Vogt, R., Baker, B., Sridharan, S.: Factor analysis subspace estimation for speaker verification with short utterances. In: INTERSPEECH, pp. 853–856 (2008)
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-Vector Based Speaker Recognition on Short Utterances. In: Interspeech 2011, pp. 2341–2344. International Speech Communication Association (ISCA), Firenze Fiera (2011), http://eprints.qut.edu.au/46313/
Google Scholar
Hatch, A.O., Kajarekar, S.S., Stolcke, A.: Within-class covariance normalization for svm-based speaker recognition. In: INTERSPEECH (2006)
Google Scholar
Prince, S., Li, P., Fu, Y., Mohammed, U., Elder, J.H.: Probabilistic models for inference about identity. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 144–157 (2012), http://dblp.uni-trier.de/db/journals/pami/pami34.html#PrinceLFME12
Article Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of I-Vector Length Normalization in Speaker Recognition Systems. In: INTERSPEECH, pp. 249–252 (2011)
Google Scholar
National Institute of Standards and a. o. Technology, The NIST Year 2010 Speaker Recognition Evaluation Plan (2010), http://www.nist.gov/itl/iad/mig/upload/NIST_SRE10_evalplanr6.pdf
Shum, S., Dehak, N., Dehak, R., Glass, J.R.: Unsupervised speaker adaptation based on the cosine similarity for text-independent speaker verification. In: Odyssey: The Speaker and Language Recognition Workshop, Brno, Czech Republic (2010)
Google Scholar
Matejka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Cernocký, J.: Full-Covariance UBM and Heavy-Tailed PLDA in I-Vector Speaker Verification. In: ICASSP, pp. 4828–4831. IEEE (2011), http://dblp.uni-trier.de/db/conf/icassp/icassp2011.html#MatejkaGCAPKBC11
Zhao, X., Dong, Y.: Variational bayesian joint factor analysis models for speaker verification. IEEE Transactions on Audio, Speech & Language Processing 20(3), 1032–1042 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biometric Recognition Group (ATVS), Escuela Politecnica Superior, Universidad Autonoma de Madrid, Spain
Javier González Domínguez, Rubén Zazo & Joaquin González-Rodríguez

Authors

Javier González Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Zazo
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin González-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domínguez, J.G., Zazo, R., González-Rodríguez, J. (2012). On the use of Total Variability and Probabilistic Linear Discriminant Analysis for Speaker Verification on Short Utterances. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics