Abstract
Deep neural network-based keyword spotting (KWS) have embraced the tremendous success in smart speech assistant applications. However, the neural network-based KWS have been demonstrated susceptible to be attacked by the adversarial examples. The investigation of efficient adversarial generation would mitigate the security flaws of network-based KWS via adversarial training. In this paper, we propose to use the conditional generative adversarial network (CGAN) to efficiently generate speech adversarial examples. Specifically, we first present a target label embedding method to map the class-wise label into feature maps. Then, we utilize generative adversarial network for constructing the target speech adversarial examples with such feature maps. The target KWS classification network is then integrated with CGAN framework, where the classification error of the target network is back-propagated via gradient flow to guide the generator updating, but the target network itself is frozen. The proposed method is evaluated on a set of state-of-the-art deep learning-based KWS classification networks. The results validate the effectiveness of the generated adversarial examples. In addition, experimental results also demonstrate that the transferability of generated adversarial example among the different KWS classification networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alzantot, M., Balaji, B., Srivastava, M.: Did you hear that? Adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554 (2018)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Chen, G., et al.: Who is real bob? Adversarial attacks on speaker recognition systems. arXiv preprint arXiv:1911.01840 (2019)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2172–2180 (2016)
Du, T., Ji, S., Li, J., Gu, Q., Wang, T., Beyah, R.: SirenAttack: generating adversarial audio for end-to-end acoustic systems. arXiv preprint arXiv:1901.07846 2(1) (2019)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1125–1134 (2017)
Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimed. 17, 2059–2071 (2015)
Kwon, H.W., Kwon, H., Yoon, H., Choi, D.: Selective audio adversarial example in evasion attack on speech recognition system. IEEE Trans. Inf. Forensics Secur. 15, 526–538 (2020)
Lee, C.Y., Toffy, A., Jung, G.J., Han, W.J.: Conditional WaveGAN. arXiv preprint arXiv:1809.10636 (2018)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Meng, Z., Zhao, Y., Li, J., Gong, Y.: Adversarial speaker verification. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6216–6220. IEEE (2019)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., Koushanfar, F.: Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828 (2019)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning (ICML), pp. 2642–2651 (2017)
Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning (ICML), pp. 5231–5240 (2019)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)
Wang, D., Dong, L., Wang, R., Yan, D., Wang, J.: Targeted speech adversarial example generation with generative adversarial network. IEEE Access 8, 124503–124513 (2020)
Warden, P.: Speech commands: a public dataset for single-word speech recognition. Dataset available from http://download.tensorflow.org/data/speech_commands_v0 1 (2017)
Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.X.: Generating adversarial examples with adversarial networks. arXiv abs/1801.02610 (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500 (2017)
Xie, Y., Li, Z., Shi, C., Liu, J., Chen, Y., Yuan, B.: Enabling fast and universal audio adversarial attack using generative model. arXiv preprint arXiv:2004.12261 (2020)
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 49–64 (2018)
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Acknowledgements
This work is supported by National Natural Science Foundation of China (Grant No. U1736215, 61672302, 61901237), the Zhejiang Natural Science Foundation (Grant No. LY20F020010, LY17F020010), the Ningbo Natural Science Foundation (Grant No. 2019A610103, 202003N4089) and K.C. Wong Magna Fund in Ningbo University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, D., Wang, R., Dong, L., Yan, D., Ren, Y. (2021). Efficient Generation of Speech Adversarial Examples with Generative Model. In: Zhao, X., Shi, YQ., Piva, A., Kim, H.J. (eds) Digital Forensics and Watermarking. IWDW 2020. Lecture Notes in Computer Science(), vol 12617. Springer, Cham. https://doi.org/10.1007/978-3-030-69449-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-69449-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69448-7
Online ISBN: 978-3-030-69449-4
eBook Packages: Computer ScienceComputer Science (R0)