High Quality Audio Adversarial Examples Without Using Psychoacoustics

Zong, Wei; Chow, Yang-Wai; Susilo, Willy

doi:10.1007/978-3-031-18067-5_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13547))

Included in the following conference series:

International Symposium on Cyberspace Safety and Security

774 Accesses

Abstract

In the automatic speech recognition (ASR) domain, most, if not all, current audio AEs are generated by applying perturbations to input audio. Adversaries either constrain norm of the perturbations or hide perturbations below the hearing threshold based on psychoacoustics. These two approaches have their respective problems: norm-constrained perturbations will introduce noticeable noise while hiding perturbations below the hearing threshold can be prevented by deliberately removing inaudible components from audio. In this paper, we present a novel method of generating targeted audio AEs. The perceptual quality of our audio AEs are significantly better compared to audio AEs generated by applying norm-constrained perturbations. Furthermore, unlike approaches that rely on psychoacoustics to hide perturbations below the hearing threshold, we show that our audio AEs can still be successfully generated even when inaudible components are removed from audio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)
Google Scholar
Chen, Y., et al.: Devil’s whisper: a general approach for physical adversarial attacks against commercial black-box speech recognition devices. In: 29th USENIX Security Symposium (USENIX Security 2020) (2020)
Google Scholar
Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured visual and speech recognition models with adversarial examples. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6980–6990 (2017)
Google Scholar
Eisenhofer, T., Schönherr, L., Frank, J., Speckemeier, L., Kolossa, D., Holz, T.: Dompteur: taming audio adversarial examples. arXiv preprint arXiv:2102.05431 (2021)
Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)
Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017, pp. 2021–2031. Association for Computational Linguistics (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Google Scholar
Li, J., Qu, S., Li, X., Szurley, J., Kolter, J.Z., Metze, F.: Adversarial music: real world audio adversary against wake-word detection system. In: Advances in Neural Information Processing Systems, pp. 11931–11941 (2019)
Google Scholar
Li, Z., Wu, Y., Liu, J., Chen, Y., Yuan, B.: AdvPulse: universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) CCS 2020: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020, pp. 1121–1134. ACM (2020)
Google Scholar
Lin, Y., Abdulla, W.H.: Principles of psychoacoustics. In: Lin, Y., Abdulla, W.H. (eds.) Audio Watermark, pp. 15–49. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-07974-5_2
Chapter Google Scholar
Liu, X., Wan, K., Ding, Y., Zhang, X., Zhu, Q.: Weighted-sampling audio adversarial example attack. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 4908–4915. AAAI Press (2020)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Google Scholar
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 2613–2617. ISCA (2019)
Google Scholar
Qin, Y., Carlini, N., Cottrell, G.W., Goodfellow, I.J., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 5231–5240 (2019)
Google Scholar
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA, 7–11 May 2001, Proceedings, pp. 749–752. IEEE (2001)
Google Scholar
Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In: 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, 24–27 February 2019. The Internet Society (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Google Scholar
Taori, R., Kamsetty, A., Chu, B., Vemuri, N.: Targeted adversarial examples for black box audio systems. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 15–20. IEEE (2019)
Google Scholar
Wang, Q., Guo, P., Xie, L.: Inaudible adversarial perturbations for targeted attack in speaker recognition. In: INTERSPEECH 2020 (2020)
Google Scholar
Xie, Q., Luong, M., Hovy, E.H., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 10684–10695. IEEE (2020)
Google Scholar
Yang, Z., Li, B., Chen, P., Song, D.: Characterizing audio adversarial examples using temporal dependency. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)
Google Scholar
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 49–64 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Cybersecurity and Cryptology (iC2), School of Computing and Information Technology, University of Wollongong, Wollongong, NSW, Australia
Wei Zong, Yang-Wai Chow & Willy Susilo

Authors

Wei Zong
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Wai Chow
View author publications
You can also search for this author in PubMed Google Scholar
Willy Susilo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zong .

Editor information

Editors and Affiliations

Xidian University, Xi'an, China
Xiaofeng Chen
Nanjing University of Information Science, Nanjing, China
Jian Shen
University of Wollongong, Wollongong, NSW, Australia
Willy Susilo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zong, W., Chow, YW., Susilo, W. (2022). High Quality Audio Adversarial Examples Without Using Psychoacoustics. In: Chen, X., Shen, J., Susilo, W. (eds) Cyberspace Safety and Security. CSS 2022. Lecture Notes in Computer Science, vol 13547. Springer, Cham. https://doi.org/10.1007/978-3-031-18067-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-18067-5_12
Published: 29 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18066-8
Online ISBN: 978-3-031-18067-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics