Abstract
The paper proposes an approach for extending deep neural networks-based solutions to closed-set speaker identification toward the open-set problem. The idea is built on the characteristics of deep neural networks trained for the classification tasks, where there is a layer consisting of a set of deep features extracted from the analyzed inputs. By extracting this vector and performing anomaly detection against the set of known speakers, new speakers can be detected and modeled for further re-identification. The approach is tested on the basis of NeMo toolkit with SpeakerNet architecture. The algorithm is shown to be working with multiple new speakers introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bai, Z., Zhang, X.L.: Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021). https://doi.org/10.1016/j.neunet.2021.03.004. https://www.sciencedirect.com/science/article/pii/S0893608021000848
Brew, A., Cunningham, P.: Combining cohort and UBM models in open set speaker identification. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing, pp. 62–67 (2009). https://doi.org/10.1109/CBMI.2009.30
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: Proceedings Interspeech 2018, pp. 1086–1090 (2018). https://doi.org/10.21437/Interspeech.2018-1929
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307
Ibrahim, N.S., Ramli, D.A.: I-vector extraction for speaker recognition based on dimensionality reduction. Procedia Comput. Sci. 126, 1534–1540 (2018). https://doi.org/10.1016/j.procs.2018.08.126. http://www.sciencedirect.com/science/article/pii/S1877050918314042. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007). https://doi.org/10.1109/TASL.2006.881693
Koluguri, N.R., Li, J., Lavrukhin, V., Ginsburg, B.: SpeakerNet: 1D depth-wise separable convolutional network for text-independent speaker recognition and verification (2020). https://doi.org/10.48550/ARXIV.2010.12653. http://arxiv.org/abs/2010.12653
Kuchaiev, O., et al.: NeMo: a toolkit for building AI applications using neural modules. arXiv preprint arXiv:1909.09577 (2019)
Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. SIAM J. Math. Data Sci. 1(2), 313–332 (2019)
Liu, M., Dai, B., Xie, Y., Yao, Z.: Improved GMM-UBM/SVM for speaker verification. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 1, p. I. IEEE (2006)
McLaughlin, J., Reynolds, D.A., Gleason, T.: A study of computation speed-ups of the GMM-UBM speaker recognition system. In: Sixth European Conference on Speech Communication and Technology. Citeseer (1999)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17(1), 91–108 (1995). https://doi.org/10.1016/0167-6393(95)00009-D. https://www.sciencedirect.com/science/article/pii/016763939500009D
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: EUROSPEECH (1997)
Zheng, R., Zhang, S., Xu, B.: Text-independent speaker identification using GMM-UBM and frame level likelihood normalization. In: 2004 International Symposium on Chinese Spoken Language Processing, pp. 289–292. IEEE (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Affek, M., Tatara, M.S. (2023). Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings. In: Kowalczuk, Z. (eds) Intelligent and Safe Computer Systems in Control and Diagnostics. DPS 2022. Lecture Notes in Networks and Systems, vol 545. Springer, Cham. https://doi.org/10.1007/978-3-031-16159-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-16159-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16158-2
Online ISBN: 978-3-031-16159-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)