A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

Jakovljević, Nikša; Jokić, Ivan; Jošić, Slobodan; Delić, Vlado

doi:10.1007/978-3-319-66429-3_3

Nikša Jakovljević¹⁶,
Ivan Jokić¹⁶,
Slobodan Jošić¹⁶ &
…
Vlado Delić¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2313 Accesses

Abstract

The paper presents results of an evaluation of covariance matrix and i-vector based speaker identification methods on Serbian S70W100s120 database. Open set speaker identification evaluation scheme was adopted. The number of target speakers and the number of impostors were 20 and 60 respectively. Additional utterances from 41 speakers were used for training. Amount of data for modeling a target speaker was limited to about 4 s of speech. In this study, the i-vector base approach showed significantly better performance (equal error rate EER ~5%) than the covariance matrix based approach (EER ~16%). This small EER for the i-vector based approach was obtained after substantial reduction of the number of the parameters in universal background model, i-vector transformation matrix and Gaussian probabilistic linear discriminant analysis that is typically reported in the papers. Additionally, these experiments showed that cepstral mean and variance normalization can deteriorate EER in case of a single channel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust Speaker Verification Using GFCC Based i-Vectors

An improved i-vector extraction algorithm for speaker verification

Article Open access 27 June 2015

Combined i-Vector and Extreme Learning Machine Approach for Robust Speaker Identification and Evaluation with SITW 2016, NIST 2008, TIMIT Databases

Article 25 March 2021

References

Hennerbert, J.: Speaker recognition, overview. In: Encyclopedia of Biometrics. Springer Science + Business Media, New York (2009)
Google Scholar
Gonzalez-Rodriguez, J.: Evaluating automatic speaker recognition systems: an overview of the NIST speaker recognition evaluations (1996–2014). Loquens 1(1), e007 (2014)
Article MathSciNet Google Scholar
Kohler, T.: The 2010 NIST Speaker Recognition Evaluation. http://archive.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2010-07/NIST-SRE/. Accessed Mar 2017
McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The 2016 speakers in the wild speaker recognition evaluation. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 823–827 (2016)
Google Scholar
Matejka, P., Glembek, O., Castalado, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Černocky, J.: Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: ICASSP 2011, Prague, Czech Republic, pp. 4828–4831 (2011)
Google Scholar
Jokić, I., Delić, V., Jokić, S., Perić, Z.: Automatic speaker recognition dependency on both the shape of auditory critical bands and speaker discriminative MFCCs. Adv. Electr. Comput. Eng. 15(4), 25–32 (2015)
Article Google Scholar
Novotny, O., Matejka, P., Plchot, O., Glembek, O., Burget, L., Černocky, J.: Analysis of speaker recognition systems in realistic scenarios of the SITW 2016 challenge. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 828–832 (2016)
Google Scholar
Sadjadi, S., Ganapathy, S., Pelecanos, J.: The IBM speaker recognition system: recent advances and error analysis. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 3633–3637 (2016)
Google Scholar
Hasan, T., Liu, G., Sadjadi, S.O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K.W., Hansen, J.: UTD-CRSS systems for 2012 NIST speaker recognition evaluation. In: ICASSP 2013, Vancouver, BC, Canada, pp. 6783–6787 (2013)
Google Scholar
Garcia-Romero, D., Espy-Wilson, C: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, Florence, Italy, pp. 249–252 (2011)
Google Scholar
Wildermoth, B.: Text-Independent Speaker Recognition Using Source Based Features. Master thesis, Griffith University, Australia (2001)
Google Scholar
Gelembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: ICASSP 2011, Prague, Czech Republic, pp. 4516–4519 (2011)
Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report CRIM-06/08-13, CRIM, Montreal (2005)
Google Scholar
Sadjadi, S., Slaney, M., Heck, L.: MSR Identity Toolbox: A MATLAB Toolbox for Speaker Recognition Research. Technical report, Microsoft Research, Conversational Systems Research Center (2013)
Google Scholar
Brookes, M.: VOICEBOX. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Delić, V., Sečujski, M., Jakovljević, N., Pekar, D., Mišković, D., Popović, B., Ostrogonac, S., Bojanić, M., Knežević, D.: Speech and language resources within speech recognition and synthesis systems for Serbian and Kindred South Slavic Languages. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 319–326. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_42
Chapter Google Scholar

Download references

Acknowledgments

This research work has been supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia, and it has been realized as a part of the research project TR 32035 and EUREKA project DANSPLAT (project ID 9944).

Author information

Authors and Affiliations

Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Nikša Jakovljević, Ivan Jokić, Slobodan Jošić & Vlado Delić

Authors

Nikša Jakovljević
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Jokić
View author publications
You can also search for this author in PubMed Google Scholar
Slobodan Jošić
View author publications
You can also search for this author in PubMed Google Scholar
Vlado Delić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikša Jakovljević .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jakovljević, N., Jokić, I., Jošić, S., Delić, V. (2017). A Comparison of Covariance Matrix and i-vector Based Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_3
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics