Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification

Xu, Wei; Wang, Xinghao; Wan, Hao; Guo, Xin; Zhao, Junhong; Deng, Feiqi; Kang, Wenxiong

doi:10.1007/978-3-030-86608-2_49

Wei Xu¹²,
Xinghao Wang¹²,
Hao Wan^12,13,
Xin Guo¹⁴,
Junhong Zhao¹²,
Feiqi Deng¹² &
…
Wenxiong Kang¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12878))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1526 Accesses
2 Citations

Abstract

Far-field speaker verification is challenging, because of interferences caused by different distances between the speaker and the recorder. In this paper, a distance discriminator, which determines whether two utterances are recorded at the same distance, is used as an auxiliary task to learn distance discrepancy information. There are two identical auxiliary tasks, one is added before the speaker embedding layer to learn distance discrepancy information via multi-task learning, and then the other is added after that layer to suppress the learned discrepancy via a gradient reversal layer. In addition, to avoid conflicts among the optimization directions of all tasks, the loss weight of every task is updated dynamically during training. Experiments on AISHELL Wake-up show a relatively 7% and 10.3% reduction of equal error rate (EER) on far-far speaker verification and near-far speaker verification respectively, compared with the single-task model, demonstrating the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bai, Z., Zhang, X.L.: Speaker recognition based on deep learning: an overview. Neural Netw. (2021)
Google Scholar
Tong, Y., et al.: The JD AI speaker verification system for the FFSVC 2020 challenge. In: Proceedings of Interspeech 2020, pp. 3476–3480 (2020)
Google Scholar
Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.H.: Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process. 18(7), 1717–1731 (2010)
Article Google Scholar
Mošner, L., Matějka, P., Novotnỳ, O., Černockỳ, J.H.: Dereverberation and beamforming in far-field speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5254–5258. IEEE (2018)
Google Scholar
Qin, X., Cai, D., Li, M.: Far-field End-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In: Interspeech, pp. 4045–4049 (2019)
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)
Google Scholar
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain Adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2010)
Article Google Scholar
Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Google Scholar
Burget, L., Novotny, O., Glembek, O.: Analysis of BUT submission in far-field scenarios of voices 2019 challenge. In: Proceedings of Interspeech (2019)
Google Scholar
Zhang, L., Wu, J., Xie, L.: NPU speaker verification system for Interspeech 2020 far-field speaker verification challenge. arXiv preprint arXiv:2008.03521 (2020)
Chen, Z., Miao, X., Xiao, R., Wang, W.: Cross-domain speaker recognition using domain adversarial Siamese network with a domain discriminator. Electron. Lett. 56(14), 737–739 (2020)
Article Google Scholar
Yi, L., Mak, M.W.: Adversarial separation and adaptation network for far-field speaker verification. In: INTERSPEECH, pp. 4298–4302 (2020)
Google Scholar
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. arXiv preprint arXiv:1608.06019 (2016)
Chen, Z., Wang, S., Qian, Y., Yu, K.: Channel invariant speaker embedding learning with joint multi-task and adversarial training. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6574–6578. IEEE (2020)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, PMLR, pp. 1180–1189 (2015)
Google Scholar
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175 (2017)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), IEEE, vol. 2, pp. 1735–1742 (2006)
Google Scholar
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)
Google Scholar
Qin, X., Bu, H., Li, M.: HI-MIA: a far-field text-dependent speaker verification database and the baselines. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7609–7613. IEEE (2020)
Google Scholar
Xie, W., Nagrani, A., Chung, J.S., Zisserman, A.: Utterance-level aggregation for speaker recognition in the wild. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5791–5795. IEEE (2019)
Google Scholar
Chung, J.S., et al.: In defence of metric learning for speaker recognition. arXiv preprint arXiv:2003.11982 (2020)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61573151 and Grant 61976095 and the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026.

Author information

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, 510641, China
Wei Xu, Xinghao Wang, Hao Wan, Junhong Zhao, Feiqi Deng & Wenxiong Kang
Guangdong Baiyun Airport Information Technology Co., Ltd. Postdoctoral Innovation Base, Guangzhou, China
Hao Wan
Guangdong Communication Polytechnic, Guangzhou, China
Xin Guo

Authors

Wei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xinghao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wan
View author publications
You can also search for this author in PubMed Google Scholar
Xin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Junhong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Feiqi Deng
View author publications
You can also search for this author in PubMed Google Scholar
Wenxiong Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junhong Zhao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jianjiang Feng
Fudan University, Shanghai, China
Junping Zhang
Shanghai Jiao Tong University, Shanghai, China
Manhua Liu
Shanghai University, Shanghai, China
Yuchun Fang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, W. et al. (2021). Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-86608-2_49
Published: 08 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86607-5
Online ISBN: 978-3-030-86608-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics