skip to main content
10.1145/3704323.3704389acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

Unsupervised Source Free Domain Adaptation by Batch Normalization for Speaker Verification

Published: 07 January 2025 Publication History

Abstract

The performance of speaker verification systems will degrade when the speaker verification systems are applied to a new dataset (target domain) instead of the training dataset (source domain), due to the different data distribution. A common approach to domain adaptation is using labeled target domain data to fine-tune them, but data labeling is a time-consuming and laborious task. Another popular approach is using source data and unlabeled target data to adapt. However, the source domain data may not be accessible during domain adaptation due to privacy concerns. To address the above problem, this paper proposes an unsupervised source-free batch normalization approach for domain adaptation of speaker verification. Specifically, only a model pre-trained in the source domain and unlabeled target domain data are needed. Then we partially adapt the statistics of batch normalization layers to the target domain using a decaying momentum factor. We conducted a cross-lingual adaptation using Voxceleb2 as the source domain and CN-Celeb1 as the target domain. Experimental results show that using only less than 5% of unlabeled target domain data, our approach achieves significant improvement compared to the baseline. Compared with the state-of-the-art approaches, our approach also shows greater competitiveness.

References

[1]
Md Jahangir Alam, Gautam Bhattacharya, and Patrick Kenny. 2018. Speaker verification in mismatched conditions with frustratingly easy domain adaptation. In Odyssey, Vol. 2018. 176–180.
[2]
Danwei Cai, Weiqing Wang, and Ming Li. 2021. An iterative framework for self-supervised deep speaker representation learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6728–6732.
[3]
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV). 132–149.
[4]
Zhengyang Chen, Bei Liu, Bing Han, Leying Zhang, and Yanmin Qian. 2022. The sjtu x-lance lab system for cnsrc 2022. arXiv preprint arXiv:https://arXiv.org/abs/2206.11699 (2022).
[5]
Zhengyang Chen, Shuai Wang, and Yanmin Qian. 2020. Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network. In Interspeech. 3017–3021.
[6]
Zhengyang Chen, Shuai Wang, and Yanmin Qian. 2021. Self-supervised learning based domain adaptation for robust speaker verification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5834–5838.
[7]
Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:https://arXiv.org/abs/2005.07143 (2020).
[8]
Yue Fan, JW Kang, LT Li, KC Li, HL Chen, ST Cheng, PY Zhang, ZY Zhou, YQ Cai, and Dong Wang. 2020. Cn-celeb: a challenging chinese speaker recognition dataset. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7604–7608.
[9]
Daniel Garcia-Romero, Alan McCree, Stephen Shum, Niko Brummer, and Carlos Vaquero. 2014. Unsupervised domain adaptation for i-vector speaker recognition. In Proceedings of Odyssey: The Speaker and Language Recognition Workshop, Vol. 8.
[10]
John HL Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine 32, 6 (2015), 74–99.
[11]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:https://arXiv.org/abs/1903.12261 (2019).
[12]
Hang-Rui Hu, Yan Song, Li-Rong Dai, Ian McLoughlin, and Lin Liu. 2022. Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification. In INTERSPEECH. 3689–3693.
[13]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
[14]
Kong Aik Lee, Qiongqiong Wang, and Takafumi Koshinaka. 2019. The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5821–5825.
[15]
Jingyu Li, Wei Liu, and Tan Lee. 2022. EDITnet: A lightweight network for unsupervised domain adaptation in speaker verification. arXiv preprint arXiv:https://arXiv.org/abs/2206.07548 (2022).
[16]
Wan Lin, Lantian Li, and Dong Wang. 2023. Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification. arXiv preprint arXiv:https://arXiv.org/abs/2309.14149 (2023).
[17]
Wan Lin, Lantian Li, and Dong Wang. 2024. A Simple Unsupervised Knowledge-Free Domain Adaptation for Speaker Recognition. Applied Sciences 14, 3 (2024), 1064.
[18]
Wei-Wei Lin, Man-Wai Mak, Longxin Li, and Jen-Tzung Chien. 2018. Reducing domain mismatch by maximum mean discrepancy based autoencoders. In Odyssey, Vol. 23. 162–167.
[19]
Yuan Liu, Yanmin Qian, Nanxin Chen, Tianfan Fu, Ya Zhang, and Kai Yu. 2015. Deep feature for text-dependent speaker verification. Speech Communication 73 (2015), 1–13.
[20]
Haiquan Mao, Feng Hong, and Man-wai Mak. 2023. Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding. IEEE Signal Processing Letters (2023).
[21]
M Jehanzeb Mirza, Jakub Micorek, Horst Possegger, and Horst Bischof. 2022. The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14765–14775.
[22]
Arsha Nagrani, Joon Son Chung, Weidi Xie, and Andrew Zisserman. 2020. Voxceleb: Large-scale speaker verification in the wild. Computer Speech & Language 60 (2020), 101027.
[23]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do imagenet classifiers generalize to imagenet?. In International conference on machine learning. PMLR, 5389–5400.
[24]
Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukáš Burget, and Oldřich Plchot. 2019. Speaker verification using end-to-end adversarial language adaptation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6006–6010.
[25]
David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5329–5333.
[26]
Baochen Sun, Jiashi Feng, and Kate Saenko. 2016. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
[27]
Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, and Haizhou Li. 2022. Self-supervised speaker recognition with loss-gated learning. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6142–6146.
[28]
Jenthe Thienpondt, Brecht Desplanques, and Kris Demuynck. 2020. The idlab voxceleb speaker recognition challenge 2020 system description. arXiv preprint arXiv:https://arXiv.org/abs/2010.12468 (2020).
[29]
Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, and Yanmin Qian. 2023. Wespeaker: A research and production oriented speaker embedding learning toolkit. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
[30]
Qing Wang, Wei Rao, Sining Sun, Leib Xie, Eng Siong Chng, and Haizhou Li. 2018. Unsupervised domain adaptation via domain adversarial training for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 4889–4893.
[31]
Xing Wei, Bin Wen, Lei Chen, Yujie Liu, Chong Zhao, and Yang Lu. 2023. Contrastive Domain Adaptation Via Delimitation Discriminator. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.
[32]
Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, and Oldřich Plchot. 2019. But system description to voxceleb speaker recognition challenge 2019. arXiv preprint arXiv:https://arXiv.org/abs/1910.12592 (2019).

Index Terms

  1. Unsupervised Source Free Domain Adaptation by Batch Normalization for Speaker Verification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICCPR '24: Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition
      October 2024
      448 pages
      ISBN:9798400717482
      DOI:10.1145/3704323
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 January 2025

      Check for updates

      Author Tags

      1. speaker verification
      2. domain adaptation
      3. batch normalization

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China
      • R&D Program of Beijing Municipal Education Commission

      Conference

      ICCPR 2024

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 7
        Total Downloads
      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media