Skip to main content

A Comparison of Covariance Matrix and i-vector Based Speaker Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

  • 2193 Accesses

Abstract

The paper presents results of an evaluation of covariance matrix and i-vector based speaker identification methods on Serbian S70W100s120 database. Open set speaker identification evaluation scheme was adopted. The number of target speakers and the number of impostors were 20 and 60 respectively. Additional utterances from 41 speakers were used for training. Amount of data for modeling a target speaker was limited to about 4 s of speech. In this study, the i-vector base approach showed significantly better performance (equal error rate EER ~5%) than the covariance matrix based approach (EER ~16%). This small EER for the i-vector based approach was obtained after substantial reduction of the number of the parameters in universal background model, i-vector transformation matrix and Gaussian probabilistic linear discriminant analysis that is typically reported in the papers. Additionally, these experiments showed that cepstral mean and variance normalization can deteriorate EER in case of a single channel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hennerbert, J.: Speaker recognition, overview. In: Encyclopedia of Biometrics. Springer Science + Business Media, New York (2009)

    Google Scholar 

  2. Gonzalez-Rodriguez, J.: Evaluating automatic speaker recognition systems: an overview of the NIST speaker recognition evaluations (1996–2014). Loquens 1(1), e007 (2014)

    Article  MathSciNet  Google Scholar 

  3. Kohler, T.: The 2010 NIST Speaker Recognition Evaluation. http://archive.signalprocessingsociety.org/technical-committees/list/sl-tc/spl-nl/2010-07/NIST-SRE/. Accessed Mar 2017

  4. McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The 2016 speakers in the wild speaker recognition evaluation. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 823–827 (2016)

    Google Scholar 

  5. Matejka, P., Glembek, O., Castalado, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., Černocky, J.: Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In: ICASSP 2011, Prague, Czech Republic, pp. 4828–4831 (2011)

    Google Scholar 

  6. Jokić, I., Delić, V., Jokić, S., Perić, Z.: Automatic speaker recognition dependency on both the shape of auditory critical bands and speaker discriminative MFCCs. Adv. Electr. Comput. Eng. 15(4), 25–32 (2015)

    Article  Google Scholar 

  7. Novotny, O., Matejka, P., Plchot, O., Glembek, O., Burget, L., Černocky, J.: Analysis of speaker recognition systems in realistic scenarios of the SITW 2016 challenge. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 828–832 (2016)

    Google Scholar 

  8. Sadjadi, S., Ganapathy, S., Pelecanos, J.: The IBM speaker recognition system: recent advances and error analysis. In: INTERSPEECH 2016, San Francisco, CA, USA, pp. 3633–3637 (2016)

    Google Scholar 

  9. Hasan, T., Liu, G., Sadjadi, S.O., Shokouhi, N., Boril, H., Ziaei, A., Misra, A., Godin, K.W., Hansen, J.: UTD-CRSS systems for 2012 NIST speaker recognition evaluation. In: ICASSP 2013, Vancouver, BC, Canada, pp. 6783–6787 (2013)

    Google Scholar 

  10. Garcia-Romero, D., Espy-Wilson, C: Analysis of i-vector length normalization in speaker recognition systems. In: INTERSPEECH 2011, Florence, Italy, pp. 249–252 (2011)

    Google Scholar 

  11. Wildermoth, B.: Text-Independent Speaker Recognition Using Source Based Features. Master thesis, Griffith University, Australia (2001)

    Google Scholar 

  12. Gelembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: ICASSP 2011, Prague, Czech Republic, pp. 4516–4519 (2011)

    Google Scholar 

  13. Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report CRIM-06/08-13, CRIM, Montreal (2005)

    Google Scholar 

  14. Sadjadi, S., Slaney, M., Heck, L.: MSR Identity Toolbox: A MATLAB Toolbox for Speaker Recognition Research. Technical report, Microsoft Research, Conversational Systems Research Center (2013)

    Google Scholar 

  15. Brookes, M.: VOICEBOX. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

  16. Delić, V., Sečujski, M., Jakovljević, N., Pekar, D., Mišković, D., Popović, B., Ostrogonac, S., Bojanić, M., Knežević, D.: Speech and language resources within speech recognition and synthesis systems for Serbian and Kindred South Slavic Languages. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 319–326. Springer, Cham (2013). doi:10.1007/978-3-319-01931-4_42

    Chapter  Google Scholar 

Download references

Acknowledgments

This research work has been supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia, and it has been realized as a part of the research project TR 32035 and EUREKA project DANSPLAT (project ID 9944).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikša Jakovljević .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Jakovljević, N., Jokić, I., Jošić, S., Delić, V. (2017). A Comparison of Covariance Matrix and i-vector Based Speaker Recognition. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics