Abstract
Deep Neural Network (DNN) is gaining popularity thanks to its ability to attain high accuracy and performance in various security-crucial scenarios. However, recent research shows that DNN-based Automatic Speech Recognition (ASR) systems are vulnerable to adversarial attacks. Specifically, these attacks mainly focus on formulating a process of adversarial example generation as iterative, optimization-based attacks. Although these attacks make significant progress, they still take large generation time to produce adversarial examples, which makes them difficult to be launched in real-world scenarios. In this article, we propose a real-time attack framework that utilizes the neural network trained by the gradient approximation method to generate adversarial examples on Keyword Spotting (KWS) systems. The experimental results show that these generated adversarial examples can easily fool a black-box KWS system to output incorrect results with only one inference. In comparison to previous works, our attack can achieve a higher success rate with less than 0.004 s. We also extend our work by presenting a novel ensemble audio adversarial attack and testing the attack on KWS systems equipped with existing defense mechanisms. The efficacy of the proposed attack is well supported by promising experimental results.
- [1] . Retrieved from https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html.Google Scholar
- [2] . 2018. Did you hear that? Adversarial examples against automatic speech recognition. Retrieved from https://arxiv.org/abs/1801.00554.Google Scholar
- [3] . 2018. Adversarial attacks and defences: A survey. Retrieved from https://arxiv.org/abs/1810.00069.Google Scholar
- [4] . 2018. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recogn. 84 (2018), 317–331.Google ScholarDigital Library
- [5] . 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (S&P’17). 39–57.Google ScholarCross Ref
- [6] . 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’17). 39–57.Google ScholarCross Ref
- [7] . 2018. Audio adversarial examples: Targeted attacks on speech-to-text. Retrieved from https://arxiv.org/abs/1801.01944.Google Scholar
- [8] . 2020. Audio adversarial examples generation with recurrent neural networks. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC’20). 488–493.Google ScholarDigital Library
- [9] . 2017. ZOO: Zeroth-order optimization-based black-box attacks to deep neural networks without training substitute models. In Proceedings of the ACM Workshop on Artificial Intelligence and Security. 15–26.Google ScholarDigital Library
- [10] . 2019. ZO-AdaMM: Zeroth-order adaptive momentum method for black-box optimization. In Proceedings of the Neural Information Processing Systems (NIPS’19).Google Scholar
- [11] . 2018. Physical adversarial examples for object detectors. Retrieved from https://arxiv.org/abs/1807.07769.Google Scholar
- [12] . 2007. An application of recurrent neural networks to discriminative keyword spotting. In Proceedings of the Internet Corporation for Assigned Names and Numbers (ICANN’07). 220–229.Google ScholarCross Ref
- [13] . 2021. Ensemble deep learning: A review. Retrieved from https://arxiv.org/abs/2104.02395.Google Scholar
- [14] . 2019. Real-time adversarial attacks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’19). 4672–4680.Google ScholarCross Ref
- [15] . 2014. Generative adversarial networks. Retrieved from https://arxiv.org/abs/1406.2661.Google Scholar
- [16] . 2015. Explaining and harnessing adversarial examples. Retrieved from https://arxiv.org/abs/1412.6572.Google Scholar
- [17] . 2014. Deep speech: Scaling up end-to-end speech recognition. Retrieved from https://arxiv.org/abs/1412.5567.Google Scholar
- [18] . 2018. Model-reuse attacks on deep learning systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS’18). 349–363.Google ScholarDigital Library
- [19] . 2019. Monocular 3D object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
- [20] . 2014. A novel scheme for speaker recognition using a phonetically-aware deep neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). 1695–1699.Google ScholarCross Ref
- [21] . 2020. Min-max optimization without gradients: Convergence and applications to black-box evasion and poisoning attacks. In Proceedings of the International Conference on Machine Learning (ICML’20).Google Scholar
- [22] . 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).Google Scholar
- [23] . 2015. Compressing deep neural networks using a rank-constrained topology. In Proceedings of the International Speech Communication Association (INTERSPEECH’15). 1473–1477.Google ScholarCross Ref
- [24] . 2016. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372–387.Google ScholarCross Ref
- [25] . 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the International Conference on Machine Learning (ICML’19).Google Scholar
- [26] . 2018. Isolated and ensemble audio preprocessing methods for detecting adversarial examples against automatic speech recognition. In Proceedings of the Conference on Computational Linguistics and Speech Processing (ROCLING’18).Google Scholar
- [27] . 1990. A hidden Markov model-based keyword recognition system. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’90). 129–132.Google ScholarCross Ref
- [28] . 2019. Adversarial attacks in sound event classification. Retrieved from https://arxiv.org/abs/1907.02477.Google Scholar
- [29] . 2018. Targeted adversarial examples for black box audio systems. Retrieved from https://arxiv.org/abs/1805.07820.Google Scholar
- [30] . 1967. Experimental, limited vocabulary, speech recognizer. IEEE Trans. Audio Electroacoust. 15 (1967), 127–130.Google ScholarCross Ref
- [31] . 2010. From NLP (natural language processing) to MLP (machine language processing). In Computer Network Security, and (Eds.). Springer, Berlin, 256–269.Google Scholar
- [32] . 2020. On adaptive attacks to adversarial example defenses. Retrieved from https://arxiv.org/abs/2002.08347.Google Scholar
- [33] . 2019. AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19).Google ScholarDigital Library
- [34] . 2016. Model compression applied to small-footprint keyword spotting. In Proceedings of the International Speech Communication Association (INTERSPEECH’16). 1878–1882.Google ScholarCross Ref
- [35] . 2019. Universal adversarial examples in speech command classification. Retrieved from https://arxiv.org/abs/1911.10182.Google Scholar
- [36] . 1990. Automatic recognition of keywords in unconstrained speech using hidden Markov models. IEEE Trans. Audio Electroacoust. 38 (1990), 1870–1878.Google Scholar
- [37] . 2018. Robust audio adversarial example for a physical attack. Retrieved from https://arxiv.org/abs/1810.11793.Google Scholar
- [38] . 2019. Adversarial attack and defense on point sets. Retrieved from https://arxiv.org/abs/1902.10899.Google Scholar
- [39] . 2018. Toward mitigating audio adversarial perturbations. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
- [40] . 2017. Hello edge: Keyword spotting on microcontrollers. Retrieved from https://arxiv.org/abs/1711.07128.Google Scholar
Index Terms
- Generation of Black-box Audio Adversarial Examples Based on Gradient Approximation and Autoencoders
Recommendations
Direction-aggregated Attack for Transferable Adversarial Examples
Deep neural networks are vulnerable to adversarial examples that are crafted by imposing imperceptible changes to the inputs. However, these adversarial examples are most successful in white-box settings where the model and its parameters are available. ...
Towards Resistant Audio Adversarial Examples
SPAI '20: Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial IntelligenceAdversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition ...
Frequency Centric Defense Mechanisms against Adversarial Examples
ADVM '21: Proceedings of the 1st International Workshop on Adversarial Learning for MultimediaAdversarial example(AE) aims at fooling a Convolution Neural Network by introducing small perturbations in the input image. The proposed work uses the magnitude and phase of the Fourier Spectrum and the entropy of the image to defend against AE. We ...
Comments