Skip to main content

Practical Backdoor Attack Against Speaker Recognition System

  • Conference paper
  • First Online:
Information Security Practice and Experience (ISPEC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13620))

Abstract

Deep learning-based models have achieved state-of-the-art performance in a wide variety of classification and recognition tasks. Although such models have been demonstrated to suffer from backdoor attacks in multiple domains, little is known whether speaker recognition system is vulnerable to such an attack, especially in the physical world. In this paper, we launch such backdoor attack on speaker recognition system (SRS) in both digital and physical space and conduct more comprehensive experiments on two common tasks of a speaker recognition system. Taking the poison position, intensity, length, frequency characteristics, and poison rate of the backdoor patterns into consideration, we design four backdoor triggers and use them to poison the training dataset. We demonstrate the results of digital and physical attack success rate (ASR) and show that all 4 backdoor patterns can achieve over 89% ASR on digital attacks and at least 70% on physical attacks. We also show that the maliciously trained model is able to provide comparable performance on clean data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Refer to [15] for more detailed description of the model.

References

  1. Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)

  2. Bhattacharya, G., Alam, M.J., Kenny, P.: Deep speaker recognition: modular or monolithic? In: INTERSPEECH, pp. 1143–1147 (2019)

    Google Scholar 

  3. Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017)

  4. Chung, S.P., Mok, A.K.: Allergy attack against automatic signature generation. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 61–80. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_4

    Chapter  Google Scholar 

  5. Chung, S.P., Mok, A.K.: Advanced allergy attacks: does a corpus really help? In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 236–255. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74320-0_13

    Chapter  Google Scholar 

  6. Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. arXiv preprint arXiv:1606.01781 (2016)

  7. Dalvi, N., Domingos, P., Sanghai, S., Verma, D.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108 (2004)

    Google Scholar 

  8. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  9. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  10. Fortuna, J., Sivakumaran, P., Ariyaeeinia, A., Malegaonkar, A.: Open-set speaker identification using adapted Gaussian mixture models. In: Ninth European Conference on Speech Communication and Technology (2005)

    Google Scholar 

  11. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  12. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  13. Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)

  14. Han, J., Moraga, C.: The influence of the sigmoid function parameters on the speed of backpropagation learning. In: Mira, J., Sandoval, F. (eds.) IWANN 1995. LNCS, vol. 930, pp. 195–201. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59497-3_175

    Chapter  Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning. Image Recogn. 7 (2015)

    Google Scholar 

  16. Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58 (2011)

    Google Scholar 

  17. Huang, Y.Y., Wang, W.Y.: Deep residual learning for weakly-supervised relation extraction. arXiv preprint arXiv:1707.08866 (2017)

  18. Koffas, S., Xu, J., Conti, M., Picek, S.: Can you hear it? Backdoor attacks via ultrasonic triggers. arXiv preprint arXiv:2107.14569 (2021)

  19. Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647 (2005)

    Google Scholar 

  20. Lowd, D., Meek, C.: Good word attacks on statistical spam filters. In: CEAS, vol. 2005 (2005)

    Google Scholar 

  21. McLaren, M., Ferrer, L., Castan, D., Lawson, A.: The speakers in the wild (SITW) speaker recognition database. In: Interspeech, pp. 818–822 (2016)

    Google Scholar 

  22. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using Mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)

  23. Multimodal Information Group (2022). https://www.nist.gov/itl/iad/mig/speaker-recognition

  24. Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: VoxCeleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)

    Article  Google Scholar 

  25. Nandwana, M.K., Ferrer, L., McLaren, M., Castan, D., Lawson, A.: Analysis of critical metadata factors for the calibration of speaker recognition systems. In: INTERSPEECH, pp. 4325–4329 (2019)

    Google Scholar 

  26. Newsome, J., Karp, B., Song, D.: Paragraph: thwarting signature learning by training maliciously. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 81–105. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_5

    Chapter  Google Scholar 

  27. Reynolds, D.A.: Gaussian mixture models. Encyclopedia Biometrics 741(659–663) (2009)

    Google Scholar 

  28. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)

    Article  Google Scholar 

  29. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  30. Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020)

    Google Scholar 

  31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  32. Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., Khudanpur, S.: Speaker recognition for multi-speaker conversations using X-vectors. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5796–5800. IEEE (2019)

    Google Scholar 

  33. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)

    Google Scholar 

  34. Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)

    Google Scholar 

  35. Wittel, G.L., Wu, S.F.: On attacking statistical spam filters. In: CEAS. Citeseer (2004)

    Google Scholar 

  36. Xu, M., Duan, L.-Y., Cai, J., Chia, L.-T., Xu, C., Tian, Q.: HMM-based audio keyword generation. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds.) PCM 2004. LNCS, vol. 3333, pp. 566–574. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30543-9_71

    Chapter  Google Scholar 

  37. Ye, J., Liu, X., You, Z., Li, G., Liu, B.: DriNet: dynamic backdoor attack against automatic speech recognization models. Appl. Sci. 12(12), 5786 (2022)

    Article  Google Scholar 

  38. Zhai, T., Li, Y., Zhang, Z.M., Wu, B., Jiang, Y., Xia, S.: Backdoor attack against speaker verification. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2560–2564 (2021)

    Google Scholar 

Download references

Acknowledgement

We would like to thank the reviewers for their helpful comments. Jianwei Tai is supported by the National Key Research and Development Program of China (No. 2019YFE0110300) and the National Natural Science Foundation of China under Grant 71971075, 72271076, and 71871079. Xiaoqi Jia is supported in part by Strategic Priority Research Program of Chinese Academy of Sciences (No. XDC02010900) and National Key Research and Development Program of China (No. 2019YFB1005201 and No. 2021YFB2910109).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengzhi Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, Y., Tai, J., Jia, X., Zhang, S. (2022). Practical Backdoor Attack Against Speaker Recognition System. In: Su, C., Gritzalis, D., Piuri, V. (eds) Information Security Practice and Experience. ISPEC 2022. Lecture Notes in Computer Science, vol 13620. Springer, Cham. https://doi.org/10.1007/978-3-031-21280-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21280-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21279-6

  • Online ISBN: 978-3-031-21280-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics