Skip to main content

Score Normalization of X-Vector Speaker Verification System for Short-Duration Speaker Verification Challenge

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Abstract

In this paper we present our contribution to the task 2 of the short-duration speaker verification (SdSV) challenge. The main task for this challenge is to find new technologies for text-dependent and text-independent speaker verification in short duration scenario. Some of the approaches used by the authors during participation in the challenge are presented. Described speaker verification systems include baseline x-vector system with PLDA backend and score normalization, x-vector system with neural PLDA backend and fusion of both systems.

The main goal of this paper is to analyze influence of different score normalization methods on x-vector based speaker verification systems performance. We found that system with PLDA backend and ZT-normalization method (single system) gives superior performance in Farsi trials, but gives lower performance improvement in English trials. Overall, in terms of minDCF single system performs 46.3% better than baseline x-vector system. We found that enroll data augmentation is useless for Neural PLDA backend, as performance of the system does not improve after adding augmented enroll data. Single system with ZT-score normalization and additional enroll audio augmentation performs 14.8% better than Neural PLDA backend system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zeinali, H., Lee, K.A., Alam, J., Burget L.: Short-duration Speaker Verification (SdSV) Challenge 2020: The Challenge Evaluation Plan. arXiv preprint https://arxiv.org/abs/1912.06311 (2019)

  2. Jung, J.W., Heo, H.S., Kim, J.H., Shim, H.J., Yu, H.J.: RawNet: advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification. In: Proceedings Interspeech 2019, pp. 1268–1272 (2019)

    Google Scholar 

  3. Yun, S., Cho, J., Eum, J., Chang, W., Hwang, K.: An end-to-end text-independent speaker verification framework with a keyword adversarial network. In: Proceedings Interspeech 2019, pp. 2923–2927 (2019)

    Google Scholar 

  4. Li, C., et al.: Deep speaker: an end-to-end neural speaker embedding system. arXiv preprint https://arxiv.org/abs/1705.02304 (2017)

  5. Xie, W., Nagrani, A., Chung, J.S., Zisserman, A.: Utterance-level aggregation for speaker recognition in the wild. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5791–5795. IEEE (2019)

    Google Scholar 

  6. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  7. Rohdin, J., et al.: End-to-end DNN based speaker recognition inspired by i-vector and PLDA. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4874–4878 (2018)

    Google Scholar 

  8. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: rRobust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018)

    Google Scholar 

  9. Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)

    Google Scholar 

  10. Garcia-Romero, D., et al.: X-vector DNN refinement with full-length recordings for speaker recognition. In: Proceedings Interspeech 2019, pp. 1493–1496 (2019)

    Google Scholar 

  11. Ramoji, S., Krishnan, P., Ganapathy, S.: NPLDA: a deep neural PLDA model for speaker verification. In: Proceedings Odyssey 2020 The Speaker and Language Recognition Workshop, pp. 202–209 (2020)

    Google Scholar 

  12. Povey, D., Ghoshal, A., Boulianne, G., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)

    Google Scholar 

  13. Snyder, D., et al.: Speaker recognition for multi-speaker conversations using x-vectors. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5796–5800. IEEE (2019)

    Google Scholar 

  14. Barras, C., Gauvain, J.L.: Feature and score normalization for speaker verification of cellular data. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2003, vol. 2, pp. 49–52. IEEE (2003)

    Google Scholar 

  15. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. arXiv preprint https://arxiv.org/abs/1706.08612 (2017)

  16. Zeinali, H., Burget, L., Černocký, J.: A multi purpose and large scale speech corpus in Persian and English for speaker and speech recognition: the DeepMine database. arXiv preprint https://arxiv.org/abs/1912.03627 (2019)

  17. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  18. Thienpondt, J., Desplanques, B., Demuynck, K.: Cross-lingual speaker verification with domain-balanced hard prototype mining and language-dependent score normalization. arXiv preprint https://arxiv.org/abs/2007.07689 (2020)

  19. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE (2018)

    Google Scholar 

  20. Gao, S.: Res2Net: a new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. (2019)

    Google Scholar 

  21. Gao, Z., et al.: Improving aggregation and loss function for better embedding learning in end-to-end speaker verification system. In: Proceedings Interspeech 2019, pp. 361–365 (2019)

    Google Scholar 

  22. Thienpondt, J., Desplanques, B., Demuynck, K.: ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. arXiv preprint https://arxiv.org/abs/2005.07143 (2020)

  23. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. IEEE (2019)

    Google Scholar 

Download references

Acknowledgements

The study was performed by a grant from the Russian Science Foundation (project 16-15-00038).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Rakhmanenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rakhmanenko, I., Kostyuchenko, E., Choynzonov, E., Balatskaya, L., Shelupanov, A. (2020). Score Normalization of X-Vector Speaker Verification System for Short-Duration Speaker Verification Challenge. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics