Skip to main content

An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

  • 1102 Accesses

Abstract

One of the most widely used self-supervised methods to train a speaker verification system is to generate the pseudo-labels using unsupervised clustering algorithms and train the speaker embedding network using the pseudo-labels in a discriminative fashion. Although the pseudo-label-based self-supervised speaker embedding extraction scheme have shown impressive performance, not much exploration was done regarding the pseudo-label generation process. In this paper, we have conducted a set of experiments using several clustering algorithms to analyze the impact of different clustering configurations for the pseudo-label-based self-supervised speaker verification system training strategy. From the experimental results, we observe that the performance of the self-supervised speaker embedding system heavily depends on the accuracy of the pseudo-labels, and the performance can be severely degraded when overfitting to the inaccurately generated pseudo-labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/joonson/voxceleb_unsupervised.

References

  1. The voxceleb speaker recognition challenge 2021 (voxsrc-21). https://www.robots.ox.ac.uk/~vgg/data/voxceleb/competition2021.html

  2. Alam, J., Fathan, A., Kang, W.H.: Text-independent speaker verification employing CNN-LSTM-TDNN hybrid networks. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 1–13. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_1

    Chapter  Google Scholar 

  3. Cai, D., Li, M.: The DKU-DukeECE system for the self-supervision speaker verification task of the 2021 voxceleb speaker recognition challenge (2021)

    Google Scholar 

  4. Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)

    Google Scholar 

  5. Day, W.H.E., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1, 7–24 (1984)

    Article  MATH  Google Scholar 

  6. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307

    Article  Google Scholar 

  7. Deng, J., Guo, J., Yang, J., Xue, N., Cotsia, I., Zafeiriou, S.P.: ArcFace: additive angular margin loss for deep face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 1 (2021). https://doi.org/10.1109/TPAMI.2021.3087709

  8. Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Meng, H., Xu, B., Zheng, T.F. (eds.) Interspeech 2020, pp. 3830–3834. ISCA (2020)

    Google Scholar 

  9. Ding, K., He, X., Wan, G.: Learning speaker embedding with momentum contrast (2020)

    Google Scholar 

  10. Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large databases. SIGMOD Rec. 27(2), 73–84 (1998). https://doi.org/10.1145/276305.276312

  11. Hansen, J.H., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015). https://doi.org/10.1109/MSP.2015.2462851

    Article  Google Scholar 

  12. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)

    Google Scholar 

  13. Huh, J., Heo, H.S., Kang, J., Watanabe, S., Chung, J.S.: Augmentation adversarial training for unsupervised speaker recognition. In: Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS (2020)

    Google Scholar 

  14. Kenny, P.: A small footprint i-vector extractor. In: Odyssey (2012)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980

  16. Mun, S.H., Kang, W.H., Han, M.H., Kim, N.S.: Unsupervised representation learning for speaker recognition via contrastive equilibrium learning (2020)

    Google Scholar 

  17. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)

    Google Scholar 

  18. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Interspeech 2019, pp. 2613–2617 (2019)

    Google Scholar 

  19. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019)

    Google Scholar 

  20. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 workshop (2011)

    Google Scholar 

  22. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: INTERSPEECH (2017)

    Google Scholar 

  23. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: Robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375

  24. Tao, R., Lee, K.A., Das, R.K., Hautamäki, V., Li, H.: Self-supervised speaker recognition with loss-gated learning (2021)

    Google Scholar 

  25. Thienpondt, J., Desplanques, B., Demuynck, K.: The IDLAB VoxCeleb speaker recognition challenge 2020 system description (2020)

    Google Scholar 

  26. Zhang, H., Zou, Y., Wang, H.: Contrastive self-supervised learning for text-independent speaker verification. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6713–6717 (2021). https://doi.org/10.1109/ICASSP39728.2021.9413351

  27. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov. 1(2), 141–182 (1997)

    Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge the funding from the Government of Canada’s New Frontiers in Research Fund (NFRF) through grant NFRFR-2021-00338 and Ministry of Economy and Innovation (MEI) of the Government of Quebec for the continued support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jahangir Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kang, W.H., Alam, J., Fathan, A. (2022). An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics