Skip to main content

High Quality Audio Adversarial Examples Without Using Psychoacoustics

  • Conference paper
  • First Online:
Cyberspace Safety and Security (CSS 2022)

Abstract

In the automatic speech recognition (ASR) domain, most, if not all, current audio AEs are generated by applying perturbations to input audio. Adversaries either constrain norm of the perturbations or hide perturbations below the hearing threshold based on psychoacoustics. These two approaches have their respective problems: norm-constrained perturbations will introduce noticeable noise while hiding perturbations below the hearing threshold can be prevented by deliberately removing inaudible components from audio. In this paper, we present a novel method of generating targeted audio AEs. The perceptual quality of our audio AEs are significantly better compared to audio AEs generated by applying norm-constrained perturbations. Furthermore, unlike approaches that rely on psychoacoustics to hide perturbations below the hearing threshold, we show that our audio AEs can still be successfully generated even when inaudible components are removed from audio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/mozilla/DeepSpeech.

  2. 2.

    https://github.com/ludlows/python-pesq.

References

  1. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)

    Google Scholar 

  2. Chen, Y., et al.: Devil’s whisper: a general approach for physical adversarial attacks against commercial black-box speech recognition devices. In: 29th USENIX Security Symposium (USENIX Security 2020) (2020)

    Google Scholar 

  3. Cisse, M., Adi, Y., Neverova, N., Keshet, J.: Houdini: fooling deep structured visual and speech recognition models with adversarial examples. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6980–6990 (2017)

    Google Scholar 

  4. Eisenhofer, T., Schönherr, L., Frank, J., Speckemeier, L., Kolossa, D., Holz, T.: Dompteur: taming audio adversarial examples. arXiv preprint arXiv:2102.05431 (2021)

  5. Hannun, A., et al.: Deep speech: scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)

  6. Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension systems. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9–11 September 2017, pp. 2021–2031. Association for Computational Linguistics (2017)

    Google Scholar 

  7. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)

    Google Scholar 

  8. Li, J., Qu, S., Li, X., Szurley, J., Kolter, J.Z., Metze, F.: Adversarial music: real world audio adversary against wake-word detection system. In: Advances in Neural Information Processing Systems, pp. 11931–11941 (2019)

    Google Scholar 

  9. Li, Z., Wu, Y., Liu, J., Chen, Y., Yuan, B.: AdvPulse: universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In: Ligatti, J., Ou, X., Katz, J., Vigna, G. (eds.) CCS 2020: 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, USA, 9–13 November 2020, pp. 1121–1134. ACM (2020)

    Google Scholar 

  10. Lin, Y., Abdulla, W.H.: Principles of psychoacoustics. In: Lin, Y., Abdulla, W.H. (eds.) Audio Watermark, pp. 15–49. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-07974-5_2

    Chapter  Google Scholar 

  11. Liu, X., Wan, K., Ding, Y., Zhang, X., Zhu, Q.: Weighted-sampling audio adversarial example attack. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020, pp. 4908–4915. AAAI Press (2020)

    Google Scholar 

  12. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  13. Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Kubin, G., Kacic, Z. (eds.) Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15–19 September 2019, pp. 2613–2617. ISCA (2019)

    Google Scholar 

  14. Qin, Y., Carlini, N., Cottrell, G.W., Goodfellow, I.J., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019, pp. 5231–5240 (2019)

    Google Scholar 

  15. Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA, 7–11 May 2001, Proceedings, pp. 749–752. IEEE (2001)

    Google Scholar 

  16. Schönherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In: 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, 24–27 February 2019. The Internet Society (2019)

    Google Scholar 

  17. Szegedy, C., et al.: Intriguing properties of neural networks. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)

    Google Scholar 

  18. Taori, R., Kamsetty, A., Chu, B., Vemuri, N.: Targeted adversarial examples for black box audio systems. In: 2019 IEEE Security and Privacy Workshops (SPW), pp. 15–20. IEEE (2019)

    Google Scholar 

  19. Wang, Q., Guo, P., Xie, L.: Inaudible adversarial perturbations for targeted attack in speaker recognition. In: INTERSPEECH 2020 (2020)

    Google Scholar 

  20. Xie, Q., Luong, M., Hovy, E.H., Le, Q.V.: Self-training with noisy student improves ImageNet classification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 10684–10695. IEEE (2020)

    Google Scholar 

  21. Yang, Z., Li, B., Chen, P., Song, D.: Characterizing audio adversarial examples using temporal dependency. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019)

    Google Scholar 

  22. Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 49–64 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zong, W., Chow, YW., Susilo, W. (2022). High Quality Audio Adversarial Examples Without Using Psychoacoustics. In: Chen, X., Shen, J., Susilo, W. (eds) Cyberspace Safety and Security. CSS 2022. Lecture Notes in Computer Science, vol 13547. Springer, Cham. https://doi.org/10.1007/978-3-031-18067-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18067-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18066-8

  • Online ISBN: 978-3-031-18067-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics