Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems

Wang, Fangwei; Song, Ruixin; Li, Qingru; Wang, Changguang

doi:10.1007/978-981-97-0808-6_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14491))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

454 Accesses

Abstract

Speaker Recognition Systems (SRSs) are gradually introducing Deep Neural Networks (DNNs) as their core architecture, while attackers exploit the weakness of DNNs to launch adversarial attacks. Previous studies generate adversarial examples by injecting the human-imperceptible noise into the gradients of audio data, which is termed as white-box attacks. However, these attacks are impractical in real-world scenarios because they have a high dependency on the internal information of the target classifier. To address this constraint, this study proposes a method applying in a black-box condition which only permits the attacker to estimate the internal information by interacting with the model through its inputs and outputs. We use the idea of the substitution-based method and transfer-based method to train various surrogate models for imitating the target models. Our methods combine the surrogate models with white-box methods like Momentum Iterative Fast Gradient Sign Method (MI-FGSM) and Enhanced Momentum Iterative Fast Gradient Sign Method (EMI-FGSM) to boost the performance of the adversarial attacks. Furthermore, a transferability analysis is conducted on multiple models under cross-architecture, cross-feature and cross-architecture-feature conditions. Additionally, frequency analysis also provides us with valuable findings about adjusting the parameters in attack algorithms. Massive experiments validate that our attack yields a prominent performance compared to previous studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hanifa, R.M., Isa, K., Mohamad, S.: A review on speaker recognition: technology and challenges. Comput. Elec. Eng. 90(3), 107005 (2021)
Article Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.: Intriguing properties of neural networks. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR). IEEE (2014)
Google Scholar
Li, X., Zhong, J., Wu, X., Yu, J., Liu, X., Meng, H.: Adversarial attacks on GMM i-vector based speaker verification systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 6579–6583. IEEE (2020)
Google Scholar
Tan, H., Wang, L., Zhang, H., Zhang, J., Shafiq, M., Gu, Z.: Adversarial attack and defense strategies of speaker recognition systems: a survey. Electronics 11(14), 2183 (2022)
Article Google Scholar
Li, J., Zhang, X., Xu, J., Ma, S., Gao, W.: Learning to fool the speaker recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, pp. 2937–2941. IEEE (2020)
Google Scholar
Li, J., et al.: Universal adversarial perturbations generative network for speaker recognition. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Google Scholar
Zhang, L., Meng, Y., Yu, J., Xiang, C., Falk, B., Zhu, H.: Voiceprint mimicry attack towards speaker verification system in smart home. In: Proceedings of the 39th IEEE Conference on Computer Communications, INFOCOM 2020, pp. 377–386. IEEE (2020)
Google Scholar
Zhang, J., et al.: NMI-FGSM-Tri: an efficient and targeted method for generating adversarial examples for speaker recognition. In: 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 167–174. IEEE (2022)
Google Scholar
Zheng, B., et al.: Black-box adversarial attacks on commercial speech platforms with minimal information. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 86–107. ACM (2021)
Google Scholar
Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 357–369. ACM (2020)
Google Scholar
Zhang, X., Zhang, X., Sun, M., Zou, X., Chen, K., Yu, N.: Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition. Complex Intell. Syst. 9(1), 65–79 (2023)
Article Google Scholar
Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 14129–14137 (2021)
Google Scholar
Kariyappa, S., Prakash, A., Qureshi, M.K.: MAZE: data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13814–13823. IEEE (2021)
Google Scholar
Wang, Y., et al.: Black-box dissector: towards erasing-based hard-label model stealing attack. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) 17th European Conference on Computer Vision, ECCV 2022. LNCS, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part V, pp. 192–208. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20065-6_12
Yuan, X., Ding, L., Zhang, L., Li, X., Wu, D.O.: ES attack: model stealing against deep neural networks without data hurdles. IEEE Trans. Emerg. Top. Comput. Intell. 6(5), 1258–1270 (2022)
Article Google Scholar
Wang, F., Ma, Z., Zhang, X., Li, Q., Wang, C.: DDSG-GAN: generative adversarial network with dual discriminators and single generator for black-box attacks. Mathematics. 11(4), 1016 (2023)
Article Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9185–9193. IEEE (2018)
Google Scholar
Wang, X., Lin, J., Hu, H., Wang, J., He, K.: Boosting adversarial transferability through enhanced momentum. arXiv preprint arXiv: 2103.10609 (2021)
Goodfellow, I.J., Shlens, J., Szegedy C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Zhang, X., Xu, Y., Zhang, S., Li, X.: A highly stealthy adaptive decay attack against speaker recognition. IEEE Access 10(11), 118789–118805 (2022)
Article Google Scholar
Luo, H., Shen, Y., Lin, F., Xu, G.: Spoofing speaker verification system by adversarial examples leveraging the generalized speaker difference. Secur. Commun. Netw. 2021, 1–10 (2021)
Google Scholar
Zhang, W., et al.: Attack on practical speaker verification system using universal adversarial perturbations. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 2575–2579. IEEE (2021)
Google Scholar
Shamsabadi, A.S., Teixeira, F.S., Abad, A., Raj, B., Cavallaro, A., Trancoso, I.: FoolHD: fooling speaker identification by highly imperceptible adversarial disturbances. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, pp. 6159–6163. IEEE (2021)
Google Scholar
Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 694–711. IEEE (2019)
Google Scholar
Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: SEC4SR: a security analysis platform for speaker recognition. arXiv preprint arXiv:2109.01766 (2021)
Chen, G., Zhao, Z., Song, F., Chen, S., Fan, L., Liu, Y.: AS2T: arbitrary source-to-target adversarial attack on speaker recognition systems. arXiv preprint arXiv:2206.03351 (2022)
Desplanques, B., Thienpondt, J., Demuynck, K.: ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. arXiv preprint arXiv:2005.07143 (2020)
Becker, S., Ackermann, M., Lapuschkin, S., Müller, K.R., Samek, W.: Interpreting and explaining deep neural networks for classification of audio signals. arXiv preprint arXiv:1807.03418 (2018)
Snyder, D., Garcia-Romero, D., Sell, G., McCree, A., Povey, D., Khudanpur, S.: Speaker recognition for multi-speaker conversations using x-vectors. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5796–5800 (2019)
Google Scholar
Son Chung, J., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of the Workshop of the 5th International Conference on Learning Representations, ICLR 2017, pp. 99–112. IEEE (2017)
Google Scholar
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: Proceedings of the 27th USENIX Security Symposium, pp. 49–64. IEEE (2018)
Google Scholar
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 749–752. IEEE (2001)
Google Scholar
Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
Sharma, Y., Ding, G.W., Brubaker, M.: On the effectiveness of low frequency perturbations. arXiv preprint arXiv:1903.00073 (2019)

Download references

Acknowledgements

This research was funded by NSFC under Grant 61572170, Natural Science Foundation of Hebei Province under Grant F2021205004, Science and Technology Foundation Project of Hebei Normal University under Grant L2021K06, Science Foundation of Returned Overseas of Hebei Province Under Grant C2020342, and Key Science Foundation of Hebei Education Department under Grant ZD2021062.

Author information

Authors and Affiliations

Key Laboratory of Network and Information Security of Hebei Province, Hebei Normal University, Shijiazhuang, 050024, China
Fangwei Wang, Qingru Li & Changguang Wang
College of Computer and Cyberspace Security, Hebei Normal University, Shijiazhuang, 050024, China
Fangwei Wang, Ruixin Song, Qingru Li & Changguang Wang

Authors

Fangwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruixin Song
View author publications
You can also search for this author in PubMed Google Scholar
Qingru Li
View author publications
You can also search for this author in PubMed Google Scholar
Changguang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changguang Wang .

Editor information

Editors and Affiliations

Royal Melbourne Institute of Technology, Melbourne, VIC, Australia
Zahir Tari
Tianjin University, Tianjin, China
Keqiu Li
University of Arizona, Tucson, AZ, USA
Hongyi Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, F., Song, R., Li, Q., Wang, C. (2024). Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14491. Springer, Singapore. https://doi.org/10.1007/978-981-97-0808-6_15

Download citation

DOI: https://doi.org/10.1007/978-981-97-0808-6_15
Published: 27 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0807-9
Online ISBN: 978-981-97-0808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Black-Box Adversarial Attacks with Training Surrogate Models Towards Speaker Recognition Systems