research-article

Unsupervised Source Free Domain Adaptation by Batch Normalization for Speaker Verification

Authors:

Wenmeng XiongAuthors Info & Claims

ICCPR '24: Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition

Pages 412 - 417

https://doi.org/10.1145/3704323.3704389

Published: 07 January 2025 Publication History

Abstract

The performance of speaker verification systems will degrade when the speaker verification systems are applied to a new dataset (target domain) instead of the training dataset (source domain), due to the different data distribution. A common approach to domain adaptation is using labeled target domain data to fine-tune them, but data labeling is a time-consuming and laborious task. Another popular approach is using source data and unlabeled target data to adapt. However, the source domain data may not be accessible during domain adaptation due to privacy concerns. To address the above problem, this paper proposes an unsupervised source-free batch normalization approach for domain adaptation of speaker verification. Specifically, only a model pre-trained in the source domain and unlabeled target domain data are needed. Then we partially adapt the statistics of batch normalization layers to the target domain using a decaying momentum factor. We conducted a cross-lingual adaptation using Voxceleb2 as the source domain and CN-Celeb1 as the target domain. Experimental results show that using only less than 5% of unlabeled target domain data, our approach achieves significant improvement compared to the baseline. Compared with the state-of-the-art approaches, our approach also shows greater competitiveness.

References

[1]

Md Jahangir Alam, Gautam Bhattacharya, and Patrick Kenny. 2018. Speaker verification in mismatched conditions with frustratingly easy domain adaptation. In Odyssey, Vol. 2018. 176–180.

[2]

Danwei Cai, Weiqing Wang, and Ming Li. 2021. An iterative framework for self-supervised deep speaker representation learning. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6728–6732.

[3]

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV). 132–149.

Digital Library

[4]

Zhengyang Chen, Bei Liu, Bing Han, Leying Zhang, and Yanmin Qian. 2022. The sjtu x-lance lab system for cnsrc 2022. arXiv preprint arXiv:https://arXiv.org/abs/2206.11699 (2022).

[5]

Zhengyang Chen, Shuai Wang, and Yanmin Qian. 2020. Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network. In Interspeech. 3017–3021.

[6]

Zhengyang Chen, Shuai Wang, and Yanmin Qian. 2021. Self-supervised learning based domain adaptation for robust speaker verification. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5834–5838.

[7]

Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:https://arXiv.org/abs/2005.07143 (2020).

[8]

Yue Fan, JW Kang, LT Li, KC Li, HL Chen, ST Cheng, PY Zhang, ZY Zhou, YQ Cai, and Dong Wang. 2020. Cn-celeb: a challenging chinese speaker recognition dataset. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7604–7608.

[9]

Daniel Garcia-Romero, Alan McCree, Stephen Shum, Niko Brummer, and Carlos Vaquero. 2014. Unsupervised domain adaptation for i-vector speaker recognition. In Proceedings of Odyssey: The Speaker and Language Recognition Workshop, Vol. 8.

[10]

John HL Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine 32, 6 (2015), 74–99.

[11]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:https://arXiv.org/abs/1903.12261 (2019).

[12]

Hang-Rui Hu, Yan Song, Li-Rong Dai, Ian McLoughlin, and Lin Liu. 2022. Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification. In INTERSPEECH. 3689–3693.

[13]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.

Digital Library

[14]

Kong Aik Lee, Qiongqiong Wang, and Takafumi Koshinaka. 2019. The CORAL+ Algorithm for Unsupervised Domain Adaptation of PLDA. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5821–5825.

[15]

Jingyu Li, Wei Liu, and Tan Lee. 2022. EDITnet: A lightweight network for unsupervised domain adaptation in speaker verification. arXiv preprint arXiv:https://arXiv.org/abs/2206.07548 (2022).

[16]

Wan Lin, Lantian Li, and Dong Wang. 2023. Multi-Domain Adaptation by Self-Supervised Learning for Speaker Verification. arXiv preprint arXiv:https://arXiv.org/abs/2309.14149 (2023).

[17]

Wan Lin, Lantian Li, and Dong Wang. 2024. A Simple Unsupervised Knowledge-Free Domain Adaptation for Speaker Recognition. Applied Sciences 14, 3 (2024), 1064.

[18]

Wei-Wei Lin, Man-Wai Mak, Longxin Li, and Jen-Tzung Chien. 2018. Reducing domain mismatch by maximum mean discrepancy based autoencoders. In Odyssey, Vol. 23. 162–167.

[19]

Yuan Liu, Yanmin Qian, Nanxin Chen, Tianfan Fu, Ya Zhang, and Kai Yu. 2015. Deep feature for text-dependent speaker verification. Speech Communication 73 (2015), 1–13.

[20]

Haiquan Mao, Feng Hong, and Man-wai Mak. 2023. Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding. IEEE Signal Processing Letters (2023).

[21]

M Jehanzeb Mirza, Jakub Micorek, Horst Possegger, and Horst Bischof. 2022. The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14765–14775.

[22]

Arsha Nagrani, Joon Son Chung, Weidi Xie, and Andrew Zisserman. 2020. Voxceleb: Large-scale speaker verification in the wild. Computer Speech & Language 60 (2020), 101027.

[23]

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do imagenet classifiers generalize to imagenet?. In International conference on machine learning. PMLR, 5389–5400.

[24]

Johan Rohdin, Themos Stafylakis, Anna Silnova, Hossein Zeinali, Lukáš Burget, and Oldřich Plchot. 2019. Speaker verification using end-to-end adversarial language adaptation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6006–6010.

[25]

David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, and Sanjeev Khudanpur. 2018. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5329–5333.

Digital Library

[26]

Baochen Sun, Jiashi Feng, and Kate Saenko. 2016. Return of frustratingly easy domain adaptation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[27]

Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, and Haizhou Li. 2022. Self-supervised speaker recognition with loss-gated learning. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6142–6146.

[28]

Jenthe Thienpondt, Brecht Desplanques, and Kris Demuynck. 2020. The idlab voxceleb speaker recognition challenge 2020 system description. arXiv preprint arXiv:https://arXiv.org/abs/2010.12468 (2020).

[29]

Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, and Yanmin Qian. 2023. Wespeaker: A research and production oriented speaker embedding learning toolkit. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.

[30]

Qing Wang, Wei Rao, Sining Sun, Leib Xie, Eng Siong Chng, and Haizhou Li. 2018. Unsupervised domain adaptation via domain adversarial training for speaker recognition. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 4889–4893.

Digital Library

[31]

Xing Wei, Bin Wen, Lei Chen, Yujie Liu, Chong Zhao, and Yang Lu. 2023. Contrastive Domain Adaptation Via Delimitation Discriminator. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1–5.

[32]

Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, and Oldřich Plchot. 2019. But system description to voxceleb speaker recognition challenge 2019. arXiv preprint arXiv:https://arXiv.org/abs/1910.12592 (2019).

Index Terms

Unsupervised Source Free Domain Adaptation by Batch Normalization for Speaker Verification
1. Applied computing
  1. Physical sciences and engineering
2. Software and its engineering
  1. Software organization and properties

Recommendations

Domain Adaptation for Speaker Verification Based on Self-supervised Learning with Adversarial Training
MultiMedia Modeling
Abstract
Speaker verification models trained on a single domain have difficulty keeping performance on new domain data. Adversarial training maps different domain data to the same subspace to handle this problem. However, adversarial training only uses ...
Reducing bias to source samples for unsupervised domain adaptation
Abstract
Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while labels are only available in the source domain. Lots of works in UDA focus on finding a common representation of the two domains via domain alignment, ...
Highlights
- A novel method named RBDA is proposed for domain adaptation.
- RBDA focuses on reducing the classifier’s bias to source samples.
- Comprehensive experiments demonstrate the effectiveness of RBDA.
Source Data-free Unsupervised Domain Adaptation for Semantic Segmentation
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Deep\footnote learning-based semantic segmentation methods require a huge amount of training images with pixel-level annotations. Unsupervised domain adaptation (UDA) for semantic segmentation enables transferring knowledge learned from the synthetic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCPR '24: Proceedings of the 2024 13th International Conference on Computing and Pattern Recognition

October 2024

448 pages

ISBN:9798400717482

DOI:10.1145/3704323

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 January 2025

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
R&D Program of Beijing Municipal Education Commission

Conference

ICCPR 2024

ICCPR 2024: 2024 13th International Conference on Computing and Pattern Recognition

October 25 - 27, 2024

Tianjin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
7
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)7

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten