IMPGA: An Effective and Imperceptible Black-Box Attack Against Automatic Speech Recognition Systems

Liang, Luopu; Guo, Bowen; Lian, Zhichao; Li, Qianmu; Jing, Huiyun

doi:10.1007/978-3-031-25201-3_27

Luopu Liang¹³,
Bowen Guo¹⁴,
Zhichao Lian¹⁴,
Qianmu Li¹³ &
…
Huiyun Jing¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13423))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

839 Accesses
2 Citations

Abstract

Machine learning systems are ubiquitous in our lives, so it is necessary to study their vulnerabilities to improve the reliability and security of the systems. In recent years, adversarial example attacks have attracted considerable attention with remarkable success in fooling machine learning systems, especially in computer vision. For automatic speech recognition (ASR) models, the current state-of-the-art attack mainly focuses on white-box methods, which assume that the adversary has full access to the details inside the model. However, this assumption is often incorrect in practice. The existing black-box attack methods have the disadvantages of low attack success rate, perceptible adversarial examples and long computation time. Constructing black-box adversarial examples for ASR systems remains a very challenging problem. In this paper, we explore the effectiveness of adversarial attacks against ASR systems. Inspired by the idea of psychoacoustic models, we design a method called Imperceptible Genetic Algorithm (IMPGA) attack based on the psychoacoustic principle of auditory masking, which is combined with genetic algorithms to address this problem. In addition, an adaptive coefficient for auditory masking is proposed in the method to balance the attack success rate with the imperceptibility of the generated adversarial samples, while it is applied to the fitness function of the genetic algorithm. Experimental results indicate that our method achieves a 38% targeted attack success rate, while maintaining 92.73% audio file similarity and reducing the required computational time. We also demonstrate the effectiveness of each improvement through ablation experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adjust-free adversarial example generation in speech recognition using evolutionary multi-objective optimization under black-box condition

Article 06 January 2021

Adversarial Examples Attack and Countermeasure for Speech Recognition System: A Survey

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

Article 21 February 2023

References

Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Google Scholar
Wang, K., Guan, D., Li, B.: Deep group residual convolutional ctc networks for speech recognition. In: Gan, G., Li, B., Li, X., Wang, S. (eds.) ADMA 2018. LNCS (LNAI), vol. 11323, pp. 318–328. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05090-0_27
Chapter Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: Proceedings of ICLR, pp. 1–5 (2014)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of ICLR, pp. 1–11 (2015)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Proceedings of ICLR, pp. 1–9 (2017)
Google Scholar
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P., DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the CVPR, pp. 2574–2582 (2016)
Google Scholar
Cisse, M.M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured visual and speech recognition models with adversarial examples. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, pp. 6980–6990 (2017)
Google Scholar
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX security symposium (USENIX Security 2018), pp. 49–64 (2018)
Google Scholar
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Google Scholar
Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. The Internet Society (2019)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231–5240. PMLR (2019)
Google Scholar
Alzantot, M., Balaji, B., Srivastava, M.: Did you hear that? Adversarial examples against automatic speech recognition. arXiv:1801.00554 (2018)
Taori, R., Kamsetty, A., Chu, B., Vemuri, N.: Targeted adversarial examples for black box audio systems. In: Proceedings of IEEE SPW, pp. 15–20 (2019)
Google Scholar
Chen, Y., et al.: Devil’s whisper: a general approach for physical adversarial attacks against commercial black-box speech recognition devices. In: USENIX Security 2020, pp. 2667–2684 (2020)
Google Scholar
Wang, Q., Zheng, B., Li, Q., Shen, C., Ba, Z.: Towards query-efficient adversarial attacks against automatic speech recognition systems. IEEE Trans. Inf. Forensics Secur. 16, 896–908 (2020)
Article Google Scholar
Hannun, A., et al.: Deep speech: Scaling up end-to-end speech recognition. ArXiv preprint arXiv:1412.5567 (2014)
Sriram, A., Jun, H., Gaur, Y., Satheesh, S.: Robust speech recognition using generative adversarial networks. In: ICASSP, pp. 5639–5643 (2018)
Google Scholar
Khare, S., Aralikatte, R., Mani, S.: Adversarial black-box attacks on automatic speech recognition systems using multi-objective evolutionary optimization. arXiv:1811.01312 (2018)
Mitchell, J.L.: Introduction to digital audio coding and standards. J. Electron. Imaging 13(2), 399 (2004)
Article MathSciNet Google Scholar
Lin, Y., Abdulla, W.H.: Principles of psychoacoustics. In: Lin, Y., Abdulla, W.H. (eds.) Audio Watermark, pp. 15–49. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-07974-5
Chapter Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Holland, J.H.: Genetic algorithms. Sci. Am. 28, 77–80 (1992)
Google Scholar
Bhagoji, A.N., He, W., Li, B., Song, D.: Exploring the space of black-box attacks on deep neural networks. arXiv preprint arXiv:1712.09491 (2017)
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: ICASSP, pp. 5206–5210. IEEE (2015)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Key R&D Program of China (2021YFF0602104-2).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Luopu Liang & Qianmu Li
School of Cyberspace Security, Nanjing University of Science and Technology, Wuxi, 214443, China
Bowen Guo & Zhichao Lian
China Academy of Information and Communications Technology, Beijing, 100191, China
Huiyun Jing

Authors

Luopu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Guo
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Lian
View author publications
You can also search for this author in PubMed Google Scholar
Qianmu Li
View author publications
You can also search for this author in PubMed Google Scholar
Huiyun Jing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhichao Lian .

Editor information

Editors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Bohan Li
Newcastle University, Callaghan, NSW, Australia
Lin Yue
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Chuanqi Tao
Jinan University, Guangzhou, China
Xuming Han
Free University of Bozen-Bolzano, Bolzano, Italy
Diego Calvanese
University of Tsukuba, Tsukuba, Japan
Toshiyuki Amagasa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, L., Guo, B., Lian, Z., Li, Q., Jing, H. (2023). IMPGA: An Effective and Imperceptible Black-Box Attack Against Automatic Speech Recognition Systems. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13423. Springer, Cham. https://doi.org/10.1007/978-3-031-25201-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-25201-3_27
Published: 10 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25200-6
Online ISBN: 978-3-031-25201-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics