Skip to main content

Lightweight Embeddings for Speaker Verification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

Abstract

This paper presents speaker verification (SV) system using deep neural networks with hash representations (binarization) of embeddings. The training procedure is performed on NIST SRE train set, verification is performed on the same corpus with test set. The system architecture is based on deep recurrent layers with attention mechanism. Semi-hard triplets selection is used for the training procedure. The resulting layer of neural network is the tanh function and it makes the hash representation training as end-to-end possible. As a consequence, such a system decreases the embedding memory size in 32x times and increases the system evaluation performance. The equal error rate (EER) is achieved with regard to embeddings without binarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China (2016)

    Google Scholar 

  2. David, S., Pegah, G., Daniel, P., Daniel, G.R., Yishay, C., Sanjeev K.: Neural network-based speaker embeddings for end-to-end speaker verification. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, California (2016)

    Google Scholar 

  3. Schroff, F., Philbin, J.: FaceNet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 815–823 (2015)

    Google Scholar 

  4. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  5. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  6. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference for Learning Representations, San Diego (2015)

    Google Scholar 

  7. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19, 788–798 (2010)

    Article  Google Scholar 

  8. Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, pp. 1–8 (2007)

    Google Scholar 

  9. Cumani, S., Laface, P., Torino, P.: Probabilistic linear discriminant analysis of i-vector posterior distributions. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada (2013)

    Google Scholar 

  10. Sainath, T.N., Weiss, R.J., Senior, A., Wilson, K.W., Vinyals, O.: Learning the speech front-end with raw waveform CLDNNs. In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany (2015)

    Google Scholar 

  11. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar (2014)

    Google Scholar 

  12. Jozefowicz, R., Zaremba W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning (ICML), Lille, France (2015)

    Google Scholar 

  13. Yang, Z., Yang, D., Dyer Chr., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California (2016)

    Google Scholar 

  14. Luong, M., Pham, H., Christopher, M.: Effective approaches to attention-based neural machine translation. In: Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal (2015)

    Google Scholar 

  15. Li., Ch., et al.: Deep speaker: an end-to-end neural speaker embedding system. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, California (2016)

    Google Scholar 

  16. Cao, Z., Long, M., Wang, J., Yu, P.: HashNet: deep learning to hash by continuation. In: IEEE International Conference on Computer Vision (ICCV), Venice, Italy (2017)

    Google Scholar 

  17. NIST SRE. https://www.nist.gov/itl/iad/mig/speaker-recognition

  18. Testarium: Research tool. http://testarium.makseq.com

  19. TfMicro: Tensorflow binding. http://github.com/makseq/tfmicro

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxim Tkachenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tkachenko, M., Yamshinin, A., Kotov, M., Nastasenko, M. (2018). Lightweight Embeddings for Speaker Verification. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_70

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics