research-article

Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders

Authors:

Ting-Chi WangAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 18, Issue 3

Article No.: 59, Pages 1 - 19

https://doi.org/10.1145/3491220

Published: 02 August 2022 Publication History

Abstract

Deep Neural Network (DNN) is gaining popularity thanks to its ability to attain high accuracy and performance in various security-crucial scenarios. However, recent research shows that DNN-based Automatic Speech Recognition (ASR) systems are vulnerable to adversarial attacks. Specifically, these attacks mainly focus on formulating a process of adversarial example generation as iterative, optimization-based attacks. Although these attacks make significant progress, they still take large generation time to produce adversarial examples, which makes them difficult to be launched in real-world scenarios. In this article, we propose a real-time attack framework that utilizes the neural network trained by the gradient approximation method to generate adversarial examples on Keyword Spotting (KWS) systems. The experimental results show that these generated adversarial examples can easily fool a black-box KWS system to output incorrect results with only one inference. In comparison to previous works, our attack can achieve a higher success rate with less than 0.004 s. We also extend our work by presenting a novel ensemble audio adversarial attack and testing the attack on KWS systems equipped with existing defense mechanisms. The efficacy of the proposed attack is well supported by promising experimental results.

References

[1]

Speech commands dataset. Retrieved from https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html.

[2]

Moustafa Alzantot, Bharathan Balaji, and Mani B. Srivastava. 2018. Did you hear that? Adversarial examples against automatic speech recognition. Retrieved from https://arxiv.org/abs/1801.00554.

[3]

Chakraborty Anirban, Alam Manaar, Dey Vishal, Chattopadhyay Anupam, and Mukhopadhyay Debdeep. 2018. Adversarial attacks and defences: A survey. Retrieved from https://arxiv.org/abs/1810.00069.

[4]

Battista Biggio and Fabio Roli. 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84 (2018), 317–331.

Digital Library

[5]

Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P’17). 39–57.

[6]

Nicholas Carlini and David A. Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 39–57.

[7]

Nicholas Carlini and David A. Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. Retrieved from https://arxiv.org/abs/1801.01944.

[8]

Kuei-Huan Chang, Po-Hao Huang, Honggang Yu, Yier Jin, and Ting-Chi Wang. 2020. Audio adversarial examples generation with recurrent neural networks. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’20). 488–493.

Digital Library

[9]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. ZOO: Zeroth-order optimization-based black-box attacks to deep neural networks without training substitute models. In Proceedings of the ACM Workshop on Artificial Intelligence and Security. 15–26.

Digital Library

[10]

Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. 2019. ZO-AdaMM: Zeroth-order adaptive momentum method for black-box optimization. In Proceedings of the Neural Information Processing Systems (NIPS’19).

[11]

Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2018. Physical adversarial examples for object detectors. Retrieved from https://arxiv.org/abs/1807.07769.

[12]

Santiago Fernández, Alex Graves, and Jürgen Schmidhuber. 2007. An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Internet Corporation for Assigned Names and Numbers (ICANN’07). 220–229.

[13]

M. A. Ganaie, Minghui Hu, M. Tanveer, and P. N. Suganthan. 2021. Ensemble deep learning: A review. Retrieved from https://arxiv.org/abs/2104.02395.

[14]

Yuan Gong, Boyang Li, Christian Poellabauer, and Yiyu Shi. 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’19). 4672–4680.

[15]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. Retrieved from https://arxiv.org/abs/1406.2661.

[16]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. Retrieved from https://arxiv.org/abs/1412.6572.

[17]

Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. Retrieved from https://arxiv.org/abs/1412.5567.

[18]

Yujie Ji, Xinyang Zhang, Shouling Ji, Xiapu Luo, and Ting Wang. 2018. Model-reuse attacks on deep learning systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’18). 349–363.

Digital Library

[19]

Jason Ku, Alex D. Pon, and Steven L. Waslander. 2019. Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).

[20]

Yun Lei, Nicolas Scheffer, Luciana Ferrer, and Mitchell McLaren. 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). 1695–1699.

[21]

Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al Dujaili, Minyi Hong, and Una-May O’Reilly. 2020. Min-max optimization without gradients: Convergence and applications to black-box evasion and poisoning attacks. In Proceedings of the International Conference on Machine Learning (ICML’20).

[22]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).

[23]

GPreetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, and Carolina Parada. 2015. Compressing deep neural networks using a rank-constrained topology. In Proceedings of the International Speech Communication Association (INTERSPEECH’15). 1473–1477.

[24]

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372–387.

[25]

Yao Qin, Nicholas Carlini, Ian Goodfellow, Garrison Cottrell, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the International Conference on Machine Learning (ICML’19).

[26]

Krishan Rajaratnam, Kunal Shah, and Jugal Kalita. 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In Proceedings of the Conference on Computational Linguistics and Speech Processing (ROCLING’18).

[27]

Richard C. Rose and Douglas B. Paul. 1990. A hidden Markov model-based keyword recognition system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’90). 129–132.

[28]

Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, and Mark Sandler. 2019. Adversarial attacks in sound event classification. Retrieved from https://arxiv.org/abs/1907.02477.

[29]

Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2018. Targeted adversarial examples for black box audio systems. Retrieved from https://arxiv.org/abs/1805.07820.

[30]

C. Teacher, H. Kellett, and L. Focht. 1967. Experimental, limited vocabulary, speech recognizer. IEEE Trans. Audio Electroacoust. 15 (1967), 127–130.

[31]

Peter Teufl, Udo Payer, and Guenter Lackner. 2010. From NLP (natural language processing) to MLP (machine language processing). In Computer Network Security, Igor Kotenko and Victor Skormin (Eds.). Springer, Berlin, 256–269.

[32]

Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. 2020. On adaptive attacks to adversarial example defenses. Retrieved from https://arxiv.org/abs/2002.08347.

[33]

Chun-Chen Tu, Paishun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-Ming Cheng. 2019. AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19).

Digital Library

[34]

George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, and Shiv Vitaladevuni. 2016. Model compression applied to small-footprint keyword spotting. In Proceedings of the International Speech Communication Association (INTERSPEECH’16). 1878–1882.

[35]

Jon Vadillo and Roberto Santana. 2019. Universal adversarial examples in speech command classification. Retrieved from https://arxiv.org/abs/1911.10182.

[36]

Jay Wilpon, Lawrence Rabiner, Chin-Hui Lee, and E. R. Goldman. 1990. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. Audio Electroacoust. 38 (1990), 1870–1878.

[37]

Hiromu Yakura and Jun Sakuma. 2018. Robust audio adversarial example for a physical attack. Retrieved from https://arxiv.org/abs/1810.11793.

[38]

Jiancheng Yang, Qiang Zhang, Rongyao Fang, Bingbing Ni, Jinxian Liu, and Qi Tian. 2019. Adversarial attack and defense on point sets. Retrieved from https://arxiv.org/abs/1902.10899.

[39]

Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Toward mitigating audio adversarial perturbations. In Proceedings of the International Conference on Learning Representations (ICLR’18).

[40]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. Retrieved from https://arxiv.org/abs/1711.07128.

Cited By

Wang JLiu J(2024)Voice Adversarial Sample Generation Method for Ultrasonicization of Motion NoiseIEEE Access10.1109/ACCESS.2024.350660512(177996-178009)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3506605
Bhanushali AMun HYun J(2024)Adversarial Attacks on Automatic Speech Recognition (ASR): A SurveyIEEE Access10.1109/ACCESS.2024.341696512(88279-88302)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3416965
Nielsen CTan Z(2023)Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in NoiseIEEE Open Journal of Signal Processing10.1109/OJSP.2023.32563214(179-187)Online publication date: 2023
https://doi.org/10.1109/OJSP.2023.3256321

Index Terms

Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders
1. Security and privacy
  1. Software and application security
    1. Software security engineering

Recommendations

Direction-aggregated Attack for Transferable Adversarial Examples
Deep neural networks are vulnerable to adversarial examples that are crafted by imposing imperceptible changes to the inputs. However, these adversarial examples are most successful in white-box settings where the model and its parameters are available. ...
Towards Resistant Audio Adversarial Examples
SPAI '20: Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence

Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition ...
Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems
Abstract
Over the last few years, the adoption of machine learning in a wide range of domains has been remarkable. Deep learning, in particular, has been extensively used to drive applications and services in specializations such as computer vision, ...
Highlights
- A taxonomy of cybersecurity applications is established.
- Adversarial machine learning is systematically overviewed.
- An extensive, curated list of cybersecurity-related datasets is provided.
- Methods for generating adversarial ...

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 18, Issue 3

July 2022

428 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3508463

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 02 August 2022

Online AM: 25 March 2022

Accepted: 01 August 2021

Revised: 01 July 2021

Received: 01 December 2020

Published in JETC Volume 18, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
431
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang JLiu J(2024)Voice Adversarial Sample Generation Method for Ultrasonicization of Motion NoiseIEEE Access10.1109/ACCESS.2024.350660512(177996-178009)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3506605
Bhanushali AMun HYun J(2024)Adversarial Attacks on Automatic Speech Recognition (ASR): A SurveyIEEE Access10.1109/ACCESS.2024.341696512(88279-88302)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3416965
Nielsen CTan Z(2023)Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in NoiseIEEE Open Journal of Signal Processing10.1109/OJSP.2023.32563214(179-187)Online publication date: 2023
https://doi.org/10.1109/OJSP.2023.3256321

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents