Skip to main content

Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification

  • Conference paper
  • First Online:
Biometric Recognition (CCBR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12878))

Included in the following conference series:

Abstract

Far-field speaker verification is challenging, because of interferences caused by different distances between the speaker and the recorder. In this paper, a distance discriminator, which determines whether two utterances are recorded at the same distance, is used as an auxiliary task to learn distance discrepancy information. There are two identical auxiliary tasks, one is added before the speaker embedding layer to learn distance discrepancy information via multi-task learning, and then the other is added after that layer to suppress the learned discrepancy via a gradient reversal layer. In addition, to avoid conflicts among the optimization directions of all tasks, the loss weight of every task is updated dynamically during training. Experiments on AISHELL Wake-up show a relatively 7% and 10.3% reduction of equal error rate (EER) on far-far speaker verification and near-far speaker verification respectively, compared with the single-task model, demonstrating the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bai, Z., Zhang, X.L.: Speaker recognition based on deep learning: an overview. Neural Netw. (2021)

    Google Scholar 

  2. Tong, Y., et al.: The JD AI speaker verification system for the FFSVC 2020 challenge. In: Proceedings of Interspeech 2020, pp. 3476–3480 (2020)

    Google Scholar 

  3. Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.H.: Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Trans. Audio Speech Lang. Process. 18(7), 1717–1731 (2010)

    Article  Google Scholar 

  4. Mošner, L., Matějka, P., Novotnỳ, O., Černockỳ, J.H.: Dereverberation and beamforming in far-field speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5254–5258. IEEE (2018)

    Google Scholar 

  5. Qin, X., Cai, D., Li, M.: Far-field End-to-end text-dependent speaker verification based on mixed training data with transfer learning and enrollment data augmentation. In: Interspeech, pp. 4045–4049 (2019)

    Google Scholar 

  6. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)

    Google Scholar 

  7. Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain Adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2010)

    Article  Google Scholar 

  8. Prince, S.J., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)

    Google Scholar 

  9. Burget, L., Novotny, O., Glembek, O.: Analysis of BUT submission in far-field scenarios of voices 2019 challenge. In: Proceedings of Interspeech (2019)

    Google Scholar 

  10. Zhang, L., Wu, J., Xie, L.: NPU speaker verification system for Interspeech 2020 far-field speaker verification challenge. arXiv preprint arXiv:2008.03521 (2020)

  11. Chen, Z., Miao, X., Xiao, R., Wang, W.: Cross-domain speaker recognition using domain adversarial Siamese network with a domain discriminator. Electron. Lett. 56(14), 737–739 (2020)

    Article  Google Scholar 

  12. Yi, L., Mak, M.W.: Adversarial separation and adaptation network for far-field speaker verification. In: INTERSPEECH, pp. 4298–4302 (2020)

    Google Scholar 

  13. Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. arXiv preprint arXiv:1608.06019 (2016)

  14. Chen, Z., Wang, S., Qian, Y., Yu, K.: Channel invariant speaker embedding learning with joint multi-task and adversarial training. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6574–6578. IEEE (2020)

    Google Scholar 

  15. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, PMLR, pp. 1180–1189 (2015)

    Google Scholar 

  16. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175 (2017)

  17. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), IEEE, vol. 2, pp. 1735–1742 (2006)

    Google Scholar 

  18. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)

    Google Scholar 

  19. Qin, X., Bu, H., Li, M.: HI-MIA: a far-field text-dependent speaker verification database and the baselines. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7609–7613. IEEE (2020)

    Google Scholar 

  20. Xie, W., Nagrani, A., Chung, J.S., Zisserman, A.: Utterance-level aggregation for speaker recognition in the wild. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5791–5795. IEEE (2019)

    Google Scholar 

  21. Chung, J.S., et al.: In defence of metric learning for speaker recognition. arXiv preprint arXiv:2003.11982 (2020)

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61573151 and Grant 61976095 and the Science and Technology Planning Project of Guangdong Province under Grant 2018B030323026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junhong Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, W. et al. (2021). Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification. In: Feng, J., Zhang, J., Liu, M., Fang, Y. (eds) Biometric Recognition. CCBR 2021. Lecture Notes in Computer Science(), vol 12878. Springer, Cham. https://doi.org/10.1007/978-3-030-86608-2_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86608-2_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86607-5

  • Online ISBN: 978-3-030-86608-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics