Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings

Affek, Michał; Tatara, Marek S.

doi:10.1007/978-3-031-16159-9_14

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 545))

Included in the following conference series:

International Conference on Diagnostics of Processes and Systems

264 Accesses

Abstract

The paper proposes an approach for extending deep neural networks-based solutions to closed-set speaker identification toward the open-set problem. The idea is built on the characteristics of deep neural networks trained for the classification tasks, where there is a layer consisting of a set of deep features extracted from the analyzed inputs. By extracting this vector and performing anomaly detection against the set of known speakers, new speakers can be detected and modeled for further re-identification. The approach is tested on the basis of NeMo toolkit with SpeakerNet architecture. The algorithm is shown to be working with multiple new speakers introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions

Article 14 June 2023

An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification

Self-supervised Speaker Verification Employing Augmentation Mix and Self-augmented Training-Based Clustering

References

Bai, Z., Zhang, X.L.: Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021). https://doi.org/10.1016/j.neunet.2021.03.004. https://www.sciencedirect.com/science/article/pii/S0893608021000848
Brew, A., Cunningham, P.: Combining cohort and UBM models in open set speaker identification. In: 2009 Seventh International Workshop on Content-Based Multimedia Indexing, pp. 62–67 (2009). https://doi.org/10.1109/CBMI.2009.30
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: Proceedings Interspeech 2018, pp. 1086–1090 (2018). https://doi.org/10.21437/Interspeech.2018-1929
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307
Article Google Scholar
Ibrahim, N.S., Ramli, D.A.: I-vector extraction for speaker recognition based on dimensionality reduction. Procedia Comput. Sci. 126, 1534–1540 (2018). https://doi.org/10.1016/j.procs.2018.08.126. http://www.sciencedirect.com/science/article/pii/S1877050918314042. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007). https://doi.org/10.1109/TASL.2006.881693
Article Google Scholar
Koluguri, N.R., Li, J., Lavrukhin, V., Ginsburg, B.: SpeakerNet: 1D depth-wise separable convolutional network for text-independent speaker recognition and verification (2020). https://doi.org/10.48550/ARXIV.2010.12653. http://arxiv.org/abs/2010.12653
Kuchaiev, O., et al.: NeMo: a toolkit for building AI applications using neural modules. arXiv preprint arXiv:1909.09577 (2019)
Linderman, G.C., Steinerberger, S.: Clustering with t-SNE, provably. SIAM J. Math. Data Sci. 1(2), 313–332 (2019)
Article MathSciNet Google Scholar
Liu, M., Dai, B., Xie, Y., Yao, Z.: Improved GMM-UBM/SVM for speaker verification. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 1, p. I. IEEE (2006)
Google Scholar
McLaughlin, J., Reynolds, D.A., Gleason, T.: A study of computation speed-ups of the GMM-UBM speaker recognition system. In: Sixth European Conference on Speech Communication and Technology. Citeseer (1999)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17(1), 91–108 (1995). https://doi.org/10.1016/0167-6393(95)00009-D. https://www.sciencedirect.com/science/article/pii/016763939500009D
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: EUROSPEECH (1997)
Google Scholar
Zheng, R., Zhang, S., Xu, B.: Text-independent speaker identification using GMM-UBM and frame level likelihood normalization. In: 2004 International Symposium on Chinese Spoken Language Processing, pp. 289–292. IEEE (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Robotics and Decision Systems, Faculty of Electronics, Telecommunications, and Informatics, Gdańsk Tech, Gdańsk, Poland
Michał Affek & Marek S. Tatara

Authors

Michał Affek
View author publications
You can also search for this author in PubMed Google Scholar
Marek S. Tatara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek S. Tatara .

Editor information

Editors and Affiliations

Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdańsk, Poland
Zdzislaw Kowalczuk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Affek, M., Tatara, M.S. (2023). Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings. In: Kowalczuk, Z. (eds) Intelligent and Safe Computer Systems in Control and Diagnostics. DPS 2022. Lecture Notes in Networks and Systems, vol 545. Springer, Cham. https://doi.org/10.1007/978-3-031-16159-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-16159-9_14
Published: 01 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16158-2
Online ISBN: 978-3-031-16159-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings

Abstract

Access this chapter

Similar content being viewed by others

A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions

An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification

Self-supervised Speaker Verification Employing Augmentation Mix and Self-augmented Training-Based Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Open-Set Speaker Identification Using Closed-Set Pretrained Embeddings

Abstract

Access this chapter

Similar content being viewed by others

A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions

An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification

Self-supervised Speaker Verification Employing Augmentation Mix and Self-augmented Training-Based Clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation