Skip to main content

Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Abstract

Speaker Recognition Systems (SRSs) are gradually introducing Deep Neural Networks (DNNs) as their core architecture, while attackers exploit the weakness of DNNs to launch adversarial attacks. Previous studies generate adversarial examples by injecting the human-imperceptible noise into the gradients of audio data, which is termed as white-box attacks. However, these attacks are impractical in real-world scenarios because they have a high dependency on the internal information of the target classifier. To address this constraint, this study proposes a method applying in a black-box condition which only permits the attacker to estimate the internal information by interacting with the model through its inputs and outputs. We use the idea of the substitution-based method and transfer-based method to train various surrogate models for imitating the target models. Our methods combine the surrogate models with white-box methods like Momentum Iterative Fast Gradient Sign Method (MI-FGSM) and Enhanced Momentum Iterative Fast Gradient Sign Method (EMI-FGSM) to boost the performance of the adversarial attacks. Furthermore, a transferability analysis is conducted on multiple models under cross-architecture, cross-feature and cross-architecture-feature conditions. Additionally, frequency analysis also provides us with valuable findings about adjusting the parameters in attack algorithms. Massive experiments validate that our attack yields a prominent performance compared to previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanifa, R.M., Isa, K., Mohamad, S.: A review on speaker recognition: technology and challenges. Comput. Elec. Eng. 90(3), 107005 (2021)

    Article  Google Scholar 

  2. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). IEEE (2014)

    Google Scholar 

  3. Li, X., Zhong, J., Wu, X., Yu, J., Liu, X., Meng, H.: Adversarial attacks on GMM i-vector based speaker verification systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 6579–6583. IEEE (2020)

    Google Scholar 

  4. Tan, H., Wang, L., Zhang, H., Zhang, J., Shafiq, M., Gu, Z.: Adversarial attack and defense strategies of speaker recognition systems: a survey. Electronics 11(14), 2183 (2022)

    Article  Google Scholar 

  5. Li, J., Zhang, X., Xu, J., Ma, S., Gao, W.: Learning to fool the speaker recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 2937–2941. IEEE (2020)

    Google Scholar 

  6. Li, J., et al.: Universal adversarial perturbations generative network for speaker recognition. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

    Google Scholar 

  7. Zhang, L., Meng, Y., Yu, J., Xiang, C., Falk, B., Zhu, H.: Voiceprint mimicry attack towards speaker verification system in smart home. In: Proceedings of the 39th IEEE Conference on Computer Communications, INFOCOM 2020, pp. 377–386. IEEE (2020)

    Google Scholar 

  8. Zhang, J., et al.: NMI-FGSM-Tri: an efficient and targeted method for generating adversarial examples for speaker recognition. In: 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 167–174. IEEE (2022)

    Google Scholar 

  9. Zheng, B., et al.: Black-box adversarial attacks on commercial speech platforms with minimal information. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 86–107. ACM (2021)

    Google Scholar 

  10. Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 357–369. ACM (2020)

    Google Scholar 

  11. Zhang, X., Zhang, X., Sun, M., Zou, X., Chen, K., Yu, N.: Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition. Complex Intell. Syst. 9(1), 65–79 (2023)

    Article  Google Scholar 

  12. Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 14129–14137 (2021)

    Google Scholar 

  13. Kariyappa, S., Prakash, A., Qureshi, M.K.: MAZE: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13814–13823. IEEE (2021)

    Google Scholar 

  14. Wang, Y., et al.: Black-box dissector: towards erasing-based hard-label model stealing attack. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) 17th European Conference on Computer Vision, ECCV 2022. LNCS, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part V, pp. 192–208. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20065-6_12

  15. Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: ES attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerg. Top. Comput. Intell. 6(5), 1258–1270 (2022)

    Article  Google Scholar 

  16. Wang, F., Ma, Z., Zhang, X., Li, Q., Wang, C.: DDSG-GAN: generative adversarial network with dual discriminators and single generator for black-box attacks. Mathematics. 11(4), 1016 (2023)

    Article  Google Scholar 

  17. Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9185–9193. IEEE (2018)

    Google Scholar 

  18. Wang, X., Lin, J., Hu, H., Wang, J., He, K.: Boosting adversarial transferability through enhanced momentum. arXiv preprint arXiv: 2103.10609 (2021)

  19. Goodfellow, I.J., Shlens, J., Szegedy C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  20. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)

    Google Scholar 

  21. Zhang, X., Xu, Y., Zhang, S., Li, X.: A highly stealthy adaptive decay attack against speaker recognition. IEEE Access 10(11), 118789–118805 (2022)

    Article  Google Scholar 

  22. Luo, H., Shen, Y., Lin, F., Xu, G.: Spoofing speaker verification system by adversarial examples leveraging the generalized speaker difference. Secur. Commun. Netw. 2021, 1–10 (2021)

    Google Scholar 

  23. Zhang, W., et al.: Attack on practical speaker verification system using universal adversarial perturbations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 2575–2579. IEEE (2021)

    Google Scholar 

  24. Shamsabadi, A.S., Teixeira, F.S., Abad, A., Raj, B., Cavallaro, A., Trancoso, I.: FoolHD: fooling speaker identification by highly imperceptible adversarial disturbances. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 6159–6163. IEEE (2021)

    Google Scholar 

  25. Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 694–711. IEEE (2019)

    Google Scholar 

  26. Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: SEC4SR: a security analysis platform for speaker recognition. arXiv preprint arXiv:2109.01766 (2021)

  27. Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: AS2T: arbitrary source-to-target adversarial attack on speaker recognition systems. arXiv preprint arXiv:2206.03351 (2022)

  28. Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)

  29. Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals. arXiv preprint arXiv:1807.03418 (2018)

  30. Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., Khudanpur, S.: Speaker recognition for multi-speaker conversations using x-vectors. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5796–5800 (2019)

    Google Scholar 

  31. Son Chung, J., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)

  32. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of the Workshop of the 5th International Conference on Learning Representations, ICLR 2017, pp. 99–112. IEEE (2017)

    Google Scholar 

  33. Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: Proceedings of the 27th USENIX Security Symposium, pp. 49–64. IEEE (2018)

    Google Scholar 

  34. Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 749–752. IEEE (2001)

    Google Scholar 

  35. Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)

    Article  Google Scholar 

  36. Sharma, Y., Ding, G.W., Brubaker, M.: On the effectiveness of low frequency perturbations. arXiv preprint arXiv:1903.00073 (2019)

Download references

Acknowledgements

This research was funded by NSFC under Grant 61572170, Natural Science Foundation of Hebei Province under Grant F2021205004, Science and Technology Foundation Project of Hebei Normal University under Grant L2021K06, Science Foundation of Returned Overseas of Hebei Province Under Grant C2020342, and Key Science Foundation of Hebei Education Department under Grant ZD2021062.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changguang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, F., Song, R., Li, Q., Wang, C. (2024). Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14491. Springer, Singapore. https://doi.org/10.1007/978-981-97-0808-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0808-6_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0807-9

  • Online ISBN: 978-981-97-0808-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics