Skip to main content

Efficient Generation of Speech Adversarial Examples with Generative Model

  • Conference paper
  • First Online:
Digital Forensics and Watermarking (IWDW 2020)

Abstract

Deep neural network-based keyword spotting (KWS) have embraced the tremendous success in smart speech assistant applications. However, the neural network-based KWS have been demonstrated susceptible to be attacked by the adversarial examples. The investigation of efficient adversarial generation would mitigate the security flaws of network-based KWS via adversarial training. In this paper, we propose to use the conditional generative adversarial network (CGAN) to efficiently generate speech adversarial examples. Specifically, we first present a target label embedding method to map the class-wise label into feature maps. Then, we utilize generative adversarial network for constructing the target speech adversarial examples with such feature maps. The target KWS classification network is then integrated with CGAN framework, where the classification error of the target network is back-propagated via gradient flow to guide the generator updating, but the target network itself is frozen. The proposed method is evaluated on a set of state-of-the-art deep learning-based KWS classification networks. The results validate the effectiveness of the generated adversarial examples. In addition, experimental results also demonstrate that the transferability of generated adversarial example among the different KWS classification networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/c/tensorflow-speech-recognition-challenge.

References

  1. Alzantot, M., Balaji, B., Srivastava, M.: Did you hear that? Adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554 (2018)

  2. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)

    Google Scholar 

  3. Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. arXiv preprint arXiv:1911.01840 (2019)

  4. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2172–2180 (2016)

    Google Scholar 

  5. Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. arXiv preprint arXiv:1901.07846 2(1) (2019)

  6. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014)

    Google Scholar 

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  8. Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)

  9. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134 (2017)

    Google Scholar 

  10. Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimed. 17, 2059–2071 (2015)

    Article  Google Scholar 

  11. Kwon, H.W., Kwon, H., Yoon, H., Choi, D.: Selective audio adversarial example in evasion attack on speech recognition system. IEEE Trans. Inf. Forensics Secur. 15, 526–538 (2020)

    Article  Google Scholar 

  12. Lee, C.Y., Toffy, A., Jung, G.J., Han, W.J.: Conditional WaveGAN. arXiv preprint arXiv:1809.10636 (2018)

  13. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)

    Google Scholar 

  14. Meng, Z., Zhao, Y., Li, J., Gong, Y.: Adversarial speaker verification. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6216–6220. IEEE (2019)

    Google Scholar 

  15. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  16. Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., Koushanfar, F.: Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019)

  17. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning (ICML), pp. 2642–2651 (2017)

    Google Scholar 

  18. Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)

  19. Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning (ICML), pp. 5231–5240 (2019)

    Google Scholar 

  20. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  22. Wang, D., Dong, L., Wang, R., Yan, D., Wang, J.: Targeted speech adversarial example generation with generative adversarial network. IEEE Access 8, 124503–124513 (2020)

    Article  Google Scholar 

  23. Warden, P.: Speech commands: a public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0 1 (2017)

  24. Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.X.: Generating adversarial examples with adversarial networks. arXiv abs/1801.02610 (2018)

    Google Scholar 

  25. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)

    Google Scholar 

  26. Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020)

  27. Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 49–64 (2018)

    Google Scholar 

  28. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. U1736215, 61672302, 61901237), the Zhejiang Natural Science Foundation (Grant No. LY20F020010, LY17F020010), the Ningbo Natural Science Foundation (Grant No. 2019A610103, 202003N4089) and K.C. Wong Magna Fund in Ningbo University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rangding Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, D., Wang, R., Dong, L., Yan, D., Ren, Y. (2021). Efficient Generation of Speech Adversarial Examples with Generative Model. In: Zhao, X., Shi, YQ., Piva, A., Kim, H.J. (eds) Digital Forensics and Watermarking. IWDW 2020. Lecture Notes in Computer Science(), vol 12617. Springer, Cham. https://doi.org/10.1007/978-3-030-69449-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69449-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69448-7

  • Online ISBN: 978-3-030-69449-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics