Efficient Generation of Speech Adversarial Examples with Generative Model

Wang, Donghua; Wang, Rangding; Dong, Li; Yan, Diqun; Ren, Yiming

doi:10.1007/978-3-030-69449-4_19

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12617))

Included in the following conference series:

International Workshop on Digital Watermarking

700 Accesses

Abstract

Deep neural network-based keyword spotting (KWS) have embraced the tremendous success in smart speech assistant applications. However, the neural network-based KWS have been demonstrated susceptible to be attacked by the adversarial examples. The investigation of efficient adversarial generation would mitigate the security flaws of network-based KWS via adversarial training. In this paper, we propose to use the conditional generative adversarial network (CGAN) to efficiently generate speech adversarial examples. Specifically, we first present a target label embedding method to map the class-wise label into feature maps. Then, we utilize generative adversarial network for constructing the target speech adversarial examples with such feature maps. The target KWS classification network is then integrated with CGAN framework, where the classification error of the target network is back-propagated via gradient flow to guide the generator updating, but the target network itself is frozen. The proposed method is evaluated on a set of state-of-the-art deep learning-based KWS classification networks. The results validate the effectiveness of the generated adversarial examples. In addition, experimental results also demonstrate that the transferability of generated adversarial example among the different KWS classification networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.kaggle.com/c/tensorflow-speech-recognition-challenge.

References

Alzantot, M., Balaji, B., Srivastava, M.: Did you hear that? Adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554 (2018)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Google Scholar
Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. arXiv preprint arXiv:1911.01840 (2019)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2172–2180 (2016)
Google Scholar
Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. arXiv preprint arXiv:1901.07846 2(1) (2019)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134 (2017)
Google Scholar
Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimed. 17, 2059–2071 (2015)
Article Google Scholar
Kwon, H.W., Kwon, H., Yoon, H., Choi, D.: Selective audio adversarial example in evasion attack on speech recognition system. IEEE Trans. Inf. Forensics Secur. 15, 526–538 (2020)
Article Google Scholar
Lee, C.Y., Toffy, A., Jung, G.J., Han, W.J.: Conditional WaveGAN. arXiv preprint arXiv:1809.10636 (2018)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Google Scholar
Meng, Z., Zhao, Y., Li, J., Gong, Y.: Adversarial speaker verification. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6216–6220. IEEE (2019)
Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., Koushanfar, F.: Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning (ICML), pp. 2642–2651 (2017)
Google Scholar
Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning (ICML), pp. 5231–5240 (2019)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Wang, D., Dong, L., Wang, R., Yan, D., Wang, J.: Targeted speech adversarial example generation with generative adversarial network. IEEE Access 8, 124503–124513 (2020)
Article Google Scholar
Warden, P.: Speech commands: a public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0 1 (2017)
Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.X.: Generating adversarial examples with adversarial networks. arXiv abs/1801.02610 (2018)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Google Scholar
Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020)
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 49–64 (2018)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. U1736215, 61672302, 61901237), the Zhejiang Natural Science Foundation (Grant No. LY20F020010, LY17F020010), the Ningbo Natural Science Foundation (Grant No. 2019A610103, 202003N4089) and K.C. Wong Magna Fund in Ningbo University.

Author information

Authors and Affiliations

College of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang, China
Donghua Wang, Rangding Wang, Li Dong, Diqun Yan & Yiming Ren

Authors

Donghua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rangding Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Dong
View author publications
You can also search for this author in PubMed Google Scholar
Diqun Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rangding Wang .

Editor information

Editors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xianfeng Zhao
Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ, USA
Yun-Qing Shi
Department of Information Engineering, University of Florence, Florence, Italy
Alessandro Piva
School of Cybersecurity, Korea University, Seoul, Korea (Republic of)
Hyoung Joong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Wang, R., Dong, L., Yan, D., Ren, Y. (2021). Efficient Generation of Speech Adversarial Examples with Generative Model. In: Zhao, X., Shi, YQ., Piva, A., Kim, H.J. (eds) Digital Forensics and Watermarking. IWDW 2020. Lecture Notes in Computer Science(), vol 12617. Springer, Cham. https://doi.org/10.1007/978-3-030-69449-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-69449-4_19
Published: 12 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69448-7
Online ISBN: 978-3-030-69449-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics