skip to main content
research-article

Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders

Authors Info & Claims
Published:02 August 2022Publication History
Skip Abstract Section

Abstract

Deep Neural Network (DNN) is gaining popularity thanks to its ability to attain high accuracy and performance in various security-crucial scenarios. However, recent research shows that DNN-based Automatic Speech Recognition (ASR) systems are vulnerable to adversarial attacks. Specifically, these attacks mainly focus on formulating a process of adversarial example generation as iterative, optimization-based attacks. Although these attacks make significant progress, they still take large generation time to produce adversarial examples, which makes them difficult to be launched in real-world scenarios. In this article, we propose a real-time attack framework that utilizes the neural network trained by the gradient approximation method to generate adversarial examples on Keyword Spotting (KWS) systems. The experimental results show that these generated adversarial examples can easily fool a black-box KWS system to output incorrect results with only one inference. In comparison to previous works, our attack can achieve a higher success rate with less than 0.004 s. We also extend our work by presenting a novel ensemble audio adversarial attack and testing the attack on KWS systems equipped with existing defense mechanisms. The efficacy of the proposed attack is well supported by promising experimental results.

REFERENCES

  1. [1] Speech commands dataset. Retrieved from https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html.Google ScholarGoogle Scholar
  2. [2] Alzantot Moustafa, Balaji Bharathan, and Srivastava Mani B.. 2018. Did you hear that? Adversarial examples against automatic speech recognition. Retrieved from https://arxiv.org/abs/1801.00554.Google ScholarGoogle Scholar
  3. [3] Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. 2018. Adversarial attacks and defences: A survey. Retrieved from https://arxiv.org/abs/1810.00069.Google ScholarGoogle Scholar
  4. [4] Biggio Battista and Roli Fabio. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84 (2018), 317331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Carlini Nicholas and Wagner David. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P’17). 3957.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Carlini Nicholas and Wagner David A.. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 3957.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Carlini Nicholas and Wagner David A.. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. Retrieved from https://arxiv.org/abs/1801.01944.Google ScholarGoogle Scholar
  8. [8] Chang Kuei-Huan, Huang Po-Hao, Yu Honggang, Jin Yier, and Wang Ting-Chi. 2020. Audio adversarial examples generation with recurrent neural networks. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’20). 488493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Pin-Yu, Zhang Huan, Sharma Yash, Yi Jinfeng, and Hsieh Cho-Jui. 2017. ZOO: Zeroth-order optimization-based black-box attacks to deep neural networks without training substitute models. In Proceedings of the ACM Workshop on Artificial Intelligence and Security. 1526.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Chen Xiangyi, Liu Sijia, Xu Kaidi, Li Xingguo, Lin Xue, Hong Mingyi, and Cox David. 2019. ZO-AdaMM: Zeroth-order adaptive momentum method for black-box optimization. In Proceedings of the Neural Information Processing Systems (NIPS’19).Google ScholarGoogle Scholar
  11. [11] Eykholt Kevin, Evtimov Ivan, Fernandes Earlence, Li Bo, Rahmati Amir, Tramer Florian, Prakash Atul, Kohno Tadayoshi, and Song Dawn. 2018. Physical adversarial examples for object detectors. Retrieved from https://arxiv.org/abs/1807.07769.Google ScholarGoogle Scholar
  12. [12] Fernández Santiago, Graves Alex, and Schmidhuber Jürgen. 2007. An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Internet Corporation for Assigned Names and Numbers (ICANN’07). 220229.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Ganaie M. A., Hu Minghui, Tanveer M., and Suganthan P. N.. 2021. Ensemble deep learning: A review. Retrieved from https://arxiv.org/abs/2104.02395.Google ScholarGoogle Scholar
  14. [14] Gong Yuan, Li Boyang, Poellabauer Christian, and Shi Yiyu. 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’19). 46724680.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Goodfellow Ian J., Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative adversarial networks. Retrieved from https://arxiv.org/abs/1406.2661.Google ScholarGoogle Scholar
  16. [16] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2015. Explaining and harnessing adversarial examples. Retrieved from https://arxiv.org/abs/1412.6572.Google ScholarGoogle Scholar
  17. [17] Hannun Awni, Case Carl, Casper Jared, Catanzaro Bryan, Diamos Greg, Elsen Erich, Prenger Ryan, Satheesh Sanjeev, Sengupta Shubho, Coates Adam, and Ng Andrew Y.. 2014. Deep speech: Scaling up end-to-end speech recognition. Retrieved from https://arxiv.org/abs/1412.5567.Google ScholarGoogle Scholar
  18. [18] Ji Yujie, Zhang Xinyang, Ji Shouling, Luo Xiapu, and Wang Ting. 2018. Model-reuse attacks on deep learning systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’18). 349363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Ku Jason, Pon Alex D., and Waslander Steven L.. 2019. Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lei Yun, Scheffer Nicolas, Ferrer Luciana, and McLaren Mitchell. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). 16951699.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liu Sijia, Lu Songtao, Chen Xiangyi, Feng Yao, Xu Kaidi, Dujaili Abdullah Al, Hong Minyi, and O’Reilly Una-May. 2020. Min-max optimization without gradients: Convergence and applications to black-box evasion and poisoning attacks. In Proceedings of the International Conference on Machine Learning (ICML’20).Google ScholarGoogle Scholar
  22. [22] Madry Aleksander, Makelov Aleksandar, Schmidt Ludwig, Tsipras Dimitris, and Vladu Adrian. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  23. [23] Nakkiran GPreetum, Alvarez Raziel, Prabhavalkar Rohit, and Parada Carolina. 2015. Compressing deep neural networks using a rank-constrained topology. In Proceedings of the International Speech Communication Association (INTERSPEECH’15). 14731477.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Papernot Nicolas, McDaniel Patrick, Jha Somesh, Fredrikson Matt, Celik Z. Berkay, and Swami Ananthram. 2016. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372387.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Qin Yao, Carlini Nicholas, Goodfellow Ian, Cottrell Garrison, and Raffel Colin. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the International Conference on Machine Learning (ICML’19).Google ScholarGoogle Scholar
  26. [26] Rajaratnam Krishan, Shah Kunal, and Kalita Jugal. 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In Proceedings of the Conference on Computational Linguistics and Speech Processing (ROCLING’18).Google ScholarGoogle Scholar
  27. [27] Rose Richard C. and Paul Douglas B.. 1990. A hidden Markov model-based keyword recognition system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’90). 129132.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Subramanian Vinod, Benetos Emmanouil, Xu Ning, McDonald SKoT, and Sandler Mark. 2019. Adversarial attacks in sound event classification. Retrieved from https://arxiv.org/abs/1907.02477.Google ScholarGoogle Scholar
  29. [29] Taori Rohan, Kamsetty Amog, Chu Brenton, and Vemuri Nikita. 2018. Targeted adversarial examples for black box audio systems. Retrieved from https://arxiv.org/abs/1805.07820.Google ScholarGoogle Scholar
  30. [30] Teacher C., Kellett H., and Focht L.. 1967. Experimental, limited vocabulary, speech recognizer. IEEE Trans. Audio Electroacoust. 15 (1967), 127130.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Teufl Peter, Payer Udo, and Lackner Guenter. 2010. From NLP (natural language processing) to MLP (machine language processing). In Computer Network Security, Kotenko Igor and Skormin Victor (Eds.). Springer, Berlin, 256269.Google ScholarGoogle Scholar
  32. [32] Tramer Florian, Carlini Nicholas, Brendel Wieland, and Madry Aleksander. 2020. On adaptive attacks to adversarial example defenses. Retrieved from https://arxiv.org/abs/2002.08347.Google ScholarGoogle Scholar
  33. [33] Tu Chun-Chen, Ting Paishun, Chen Pin-Yu, Liu Sijia, Zhang Huan, Yi Jinfeng, Hsieh Cho-Jui, and Cheng Shin-Ming. 2019. AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Tucker George, Wu Minhua, Sun Ming, Panchapagesan Sankaran, Fu Gengshen, and Vitaladevuni Shiv. 2016. Model compression applied to small-footprint keyword spotting. In Proceedings of the International Speech Communication Association (INTERSPEECH’16). 18781882.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Vadillo Jon and Santana Roberto. 2019. Universal adversarial examples in speech command classification. Retrieved from https://arxiv.org/abs/1911.10182.Google ScholarGoogle Scholar
  36. [36] Wilpon Jay, Rabiner Lawrence, Lee Chin-Hui, and Goldman E. R.. 1990. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. Audio Electroacoust. 38 (1990), 18701878.Google ScholarGoogle Scholar
  37. [37] Yakura Hiromu and Sakuma Jun. 2018. Robust audio adversarial example for a physical attack. Retrieved from https://arxiv.org/abs/1810.11793.Google ScholarGoogle Scholar
  38. [38] Yang Jiancheng, Zhang Qiang, Fang Rongyao, Ni Bingbing, Liu Jinxian, and Tian Qi. 2019. Adversarial attack and defense on point sets. Retrieved from https://arxiv.org/abs/1902.10899.Google ScholarGoogle Scholar
  39. [39] Yang Zhuolin, Li Bo, Chen Pin-Yu, and Song Dawn. 2018. Toward mitigating audio adversarial perturbations. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  40. [40] Zhang Yundong, Suda Naveen, Lai Liangzhen, and Chandra Vikas. 2017. Hello edge: Keyword spotting on microcontrollers. Retrieved from https://arxiv.org/abs/1711.07128.Google ScholarGoogle Scholar

Index Terms

  1. Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Journal on Emerging Technologies in Computing Systems
      ACM Journal on Emerging Technologies in Computing Systems  Volume 18, Issue 3
      July 2022
      428 pages
      ISSN:1550-4832
      EISSN:1550-4840
      DOI:10.1145/3508463
      • Editor:
      • Ramesh Karri
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 August 2022
      • Online AM: 25 March 2022
      • Accepted: 1 August 2021
      • Revised: 1 July 2021
      • Received: 1 December 2020
      Published in jetc Volume 18, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format