Abstract
This paper presents speaker verification (SV) system using deep neural networks with hash representations (binarization) of embeddings. The training procedure is performed on NIST SRE train set, verification is performed on the same corpus with test set. The system architecture is based on deep recurrent layers with attention mechanism. Semi-hard triplets selection is used for the training procedure. The resulting layer of neural network is the tanh function and it makes the hash representation training as end-to-end possible. As a consequence, such a system decreases the embedding memory size in 32x times and increases the system evaluation performance. The equal error rate (EER) is achieved with regard to embeddings without binarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Heigold, G., Moreno, I., Bengio, S., Shazeer, N.: End-to-end text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China (2016)
David, S., Pegah, G., Daniel, P., Daniel, G.R., Yishay, C., Sanjeev K.: Neural network-based speaker embeddings for end-to-end speaker verification. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, California (2016)
Schroff, F., Philbin, J.: FaceNet: A unified embedding for face recognition and clustering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 815–823 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference for Learning Representations, San Diego (2015)
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19, 788–798 (2010)
Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, pp. 1–8 (2007)
Cumani, S., Laface, P., Torino, P.: Probabilistic linear discriminant analysis of i-vector posterior distributions. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada (2013)
Sainath, T.N., Weiss, R.J., Senior, A., Wilson, K.W., Vinyals, O.: Learning the speech front-end with raw waveform CLDNNs. In: 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany (2015)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar (2014)
Jozefowicz, R., Zaremba W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning (ICML), Lille, France (2015)
Yang, Z., Yang, D., Dyer Chr., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California (2016)
Luong, M., Pham, H., Christopher, M.: Effective approaches to attention-based neural machine translation. In: Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal (2015)
Li., Ch., et al.: Deep speaker: an end-to-end neural speaker embedding system. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, California (2016)
Cao, Z., Long, M., Wang, J., Yu, P.: HashNet: deep learning to hash by continuation. In: IEEE International Conference on Computer Vision (ICCV), Venice, Italy (2017)
NIST SRE. https://www.nist.gov/itl/iad/mig/speaker-recognition
Testarium: Research tool. http://testarium.makseq.com
TfMicro: Tensorflow binding. http://github.com/makseq/tfmicro
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Tkachenko, M., Yamshinin, A., Kotov, M., Nastasenko, M. (2018). Lightweight Embeddings for Speaker Verification. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_70
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_70
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)