Abstract
Speaker verification models trained on a single domain have difficulty keeping performance on new domain data. Adversarial training maps different domain data to the same subspace to handle this problem. However, adversarial training only uses domain labels on the target domain and does not mine its speaker information. To improve the domain adaptation performance for speaker verification, we propose a joint training strategy for adversarial training and self-supervised learning. In our method, adversarial training adapts knowledge from the source domain to the target domain, while self-supervised learning obtains speech representations from unlabeled utterances. Further, our self-supervised learning only uses positive pairs to avoid false negative samples. The proposed joint training strategy enables adversarial training to guide self-supervised learning to focus on speaker verification tasks. Experiments show our proposed method outperforms other domain adaptation methods.
Q. Li and J. Qiang—Authors contributed equally to this research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: Robust DNN embeddings for speaker recognition. In: ICASSP, pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Meng, H., Xu, B., Zheng, T.F. (eds.) INTERSPEECH, pp. 3830–3834. ISCA (2020)
India, M., Safari, P., Hernando, J.: Double multi-head attention for speaker verification. In: ICASSP, pp. 6144–6148 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414877
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189. PMLR (2015)
Wang, Q., Rao, W., Guo, P., Xie, L.: Adversarial training for multi-domain speaker recognition. In: 12th International Symposium on Chinese Spoken Language Processing, pp. 1–5. IEEE (2021)
Wang, Q., Rao, W., Sun, S., Xie, L., Chng, E.S., Li, H.: Unsupervised domain adaptation via domain adversarial training for speaker recognition. In: ICASSP, pp. 4889–4893 (2018). https://doi.org/10.1109/ICASSP.2018.8461423
Wei, G., Lan, C., Zeng, W., Zhang, Z., Chen, Z.: ToAlign: task-oriented alignment for unsupervised domain adaptation. Adv. Neural. Inf. Process. Syst. 34, 13834–13846 (2021)
Chen, Z., Wang, S., Qian, Y.: Adversarial domain adaptation for speaker verification using partially shared network. In: INTERSPEECH, pp. 3017–3021 (2020)
Tu, Y., Mak, M.W., Chien, J.T.: Variational domain adversarial learning for speaker verification. In: Interspeech (2019)
Tu, Y., Mak, M.W.: Information maximized variational domain adversarial learning for speaker verification. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6449–6453 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053735
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Zhang, H., Zou, Y., Wang, H.: Contrastive self-supervised learning for text-independent speaker verification. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6713–6717. IEEE (2021)
Xia, W., Zhang, C., Weng, C., Yu, M., Yu, D.: Self-supervised text-independent speaker verification using prototypical momentum contrastive learning. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6723–6727. IEEE (2021)
Chen, Z., Wang, S., Qian, Y.: Self-supervised learning based domain adaptation for robust speaker verification. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5834–5838. IEEE (2021)
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Zhang, W., Ouyang, W., Li, W., Xu, D.: Collaborative and adversarial network for unsupervised domain adaptation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3801–3809 (2018). https://doi.org/10.1109/CVPR.2018.00400
Huh, J., Heo, H.S., Kang, J., Watanabe, S., Chung, J.S.: Augmentation adversarial training for unsupervised speaker recognition. In: Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS (2020)
Kang, J., Huh, J., Heo, H.S., Chung, J.S.: Augmentation adversarial training for self-supervised speaker representation learning. IEEE J. Sel. Top. Sig. Process. 16(6), 1253–1262 (2022). https://doi.org/10.1109/JSTSP.2022.3200915
Nagrani, A., Chung, J.S., Zisserman, A.: Voxceleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)
Chung, J.S., Nagrani, A., Zisserman, A.: Voxceleb2: deep speaker recognition. In: INTERSPEECH (2018)
Fan, Y., et al.: Cn-celeb: a challenging Chinese speaker recognition dataset. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7604–7608. IEEE (2020)
Li, L., Liu, R., Kang, J., Fan, Y., Cui, H., Cai, Y., Vipperla, R., Zheng, T.F., Wang, D.: Cn-celeb: multi-genre speaker recognition. Speech Commun. 137, 77–91 (2022)
Sang, M., Li, H., Liu, F., Arnold, A.O., Wan, L.: Self-supervised speaker verification with simple siamese network and self-supervised regularization. In: ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6127–6131. IEEE (2022)
Snyder, D., Chen, G., Povey, D.: Musan: A music, speech, and noise corpus. arXiv preprint arXiv:1510.08484 (2015)
Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., Khudanpur, S.: A study on data augmentation of reverberant speech for robust speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5220–5224 (2017). https://doi.org/10.1109/ICASSP.2017.7953152
Chung, J.S., et al.: In defence of metric learning for speaker recognition. arXiv preprint arXiv:2003.11982 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Q., Qiang, J., Yang, Q. (2024). Domain Adaptation for Speaker Verification Based on Self-supervised Learning with Adversarial Training. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-53305-1_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53304-4
Online ISBN: 978-3-031-53305-1
eBook Packages: Computer ScienceComputer Science (R0)