Abstract
Machine learning systems are ubiquitous in our lives, so it is necessary to study their vulnerabilities to improve the reliability and security of the systems. In recent years, adversarial example attacks have attracted considerable attention with remarkable success in fooling machine learning systems, especially in computer vision. For automatic speech recognition (ASR) models, the current state-of-the-art attack mainly focuses on white-box methods, which assume that the adversary has full access to the details inside the model. However, this assumption is often incorrect in practice. The existing black-box attack methods have the disadvantages of low attack success rate, perceptible adversarial examples and long computation time. Constructing black-box adversarial examples for ASR systems remains a very challenging problem. In this paper, we explore the effectiveness of adversarial attacks against ASR systems. Inspired by the idea of psychoacoustic models, we design a method called Imperceptible Genetic Algorithm (IMPGA) attack based on the psychoacoustic principle of auditory masking, which is combined with genetic algorithms to address this problem. In addition, an adaptive coefficient for auditory masking is proposed in the method to balance the attack success rate with the imperceptibility of the generated adversarial samples, while it is applied to the fitness function of the genetic algorithm. Experimental results indicate that our method achieves a 38% targeted attack success rate, while maintaining 92.73% audio file similarity and reducing the required computational time. We also demonstrate the effectiveness of each improvement through ablation experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Wang, K., Guan, D., Li, B.: Deep group residual convolutional ctc networks for speech recognition. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 318–328. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_27
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of ICLR, pp. 1–5 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of ICLR, pp. 1–11 (2015)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of ICLR, pp. 1–9 (2017)
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P., DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the CVPR, pp. 2574–2582 (2016)
Cisse, M.M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured visual and speech recognition models with adversarial examples. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, pp. 6980–6990 (2017)
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX security symposium (USENIX Security 2018), pp. 49–64 (2018)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. The Internet Society (2019)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231–5240. PMLR (2019)
Alzantot, M., Balaji, B., Srivastava, M.: Did you hear that? Adversarial examples against automatic speech recognition. arXiv:1801.00554 (2018)
Taori, R., Kamsetty, A., Chu, B., Vemuri, N.: Targeted adversarial examples for black box audio systems. In: Proceedings of IEEE SPW, pp. 15–20 (2019)
Chen, Y., et al.: Devil’s whisper: a general approach for physical adversarial attacks against commercial black-box speech recognition devices. In: USENIX Security 2020, pp. 2667–2684 (2020)
Wang, Q., Zheng, B., Li, Q., Shen, C., Ba, Z.: Towards query-efficient adversarial attacks against automatic speech recognition systems. IEEE Trans. Inf. Forensics Secur. 16, 896–908 (2020)
Hannun, A., et al.: Deep speech: Scaling up end-to-end speech recognition. ArXiv preprint arXiv:1412.5567 (2014)
Sriram, A., Jun, H., Gaur, Y., Satheesh, S.: Robust speech recognition using generative adversarial networks. In: ICASSP, pp. 5639–5643 (2018)
Khare, S., Aralikatte, R., Mani, S.: Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. arXiv:1811.01312 (2018)
Mitchell, J.L.: Introduction to digital audio coding and standards. J. Electron. Imaging 13(2), 399 (2004)
Lin, Y., Abdulla, W.H.: Principles of psychoacoustics. In: Lin, Y., Abdulla, W.H. (eds.) Audio Watermark, pp. 15–49. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-07974-5
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Holland, J.H.: Genetic algorithms. Sci. Am. 28, 77–80 (1992)
Bhagoji, A.N., He, W., Li, B., Song, D.: Exploring the space of black-box attacks on deep neural networks. arXiv preprint arXiv:1712.09491 (2017)
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: ICASSP, pp. 5206–5210. IEEE (2015)
Acknowledgments
This work is supported by the National Key R&D Program of China (2021YFF0602104-2).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liang, L., Guo, B., Lian, Z., Li, Q., Jing, H. (2023). IMPGA: An Effective and Imperceptible Black-Box Attack Against Automatic Speech Recognition Systems. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13423. Springer, Cham. https://doi.org/10.1007/978-3-031-25201-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-25201-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25200-6
Online ISBN: 978-3-031-25201-3
eBook Packages: Computer ScienceComputer Science (R0)