Skip to main content

Comparing Unsupervised Detection Algorithms for Audio Adversarial Examples

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

  • 1115 Accesses

Abstract

Recent works on automatic speech recognition (ASR) systems have shown that the underlying neural networks are vulnerable to so-called adversarial examples. In order to avoid these attacks, different defense mechanisms have been proposed. Most defense mechanisms discussed so far are based on supervised learning, which requires a lot of resources. In this research, we present and compare various unsupervised learning methods for the detection of audio adversarial examples (including autoencoder, VAE, OCSVM, and isolation forest), requiring no adversarial examples in the training data. Our experimental results show that some of the considered methods successfully defend against a simple adversarial attack, e.g., with isolation forest. Even in a more elaborate attack scenario that considers human psychoacoustics, we still achieve a high detection rate with the cost of slightly increased false positive rate, e.g., with an autoencoder. We expect our detailed analysis to be a helpful baseline for further research in the area of defense methods against audio adversarial examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://huggingface.co/speechbrain/asr-crdnn-transformerlm-librispeech.

  2. 2.

    https://github.com/timherng/audio_adversarial_examples.

References

  1. Abdullah, H., Warren, K., Bindschaedler, V., Papernot, N., Traynor, P.: SoK: the faults in our ASRs: an overview of attacks against automatic speech recognition and speaker identification systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 730–747. IEEE (2021)

    Google Scholar 

  2. Akinwande, V., Cintas, C., Speakman, S., Sridharan, S.: Identifying audio adversarial examples via anomalous pattern detection (2020)

    Google Scholar 

  3. Andronic, I., Kürzinger, L., Chavez Rosas, E.R., Rigoll, G., Seeber, B.U.: MP3 compression to diminish adversarial noise in end-to-end speech recognition. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 22–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_3

    Chapter  Google Scholar 

  4. Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: LREC 2020 (2020)

    Google Scholar 

  5. Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7. IEEE (2018)

    Google Scholar 

  6. Das, N., et al.: Compression to the rescue: defending from adversarial attacks across modalities. In: KDD Project Showcase (2018)

    Google Scholar 

  7. Hussain, S., Neekhara, P., Dubnov, S., McAuley, J., Koushanfar, F.: WaveGuard: understanding and mitigating audio adversarial examples. In: USENIX Security 2021 (2021)

    Google Scholar 

  8. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013). https://doi.org/10.48550/arxiv.1312.6114. https://arxiv.org/abs/1312.6114

  9. Liu, A., Yang, S., Chi, P.H., Hsu, P., Lee, H.: Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020)

    Google Scholar 

  10. Liu, A.T., Li, S.W., Lee, H.: TERA: self-supervised learning of transformer encoder representation for speech. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2351–2366 (2021)

    Article  Google Scholar 

  11. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)

    Google Scholar 

  12. Mendes, E., Hogan, K.: Defending against imperceptible audio adversarial examples using proportional additive Gaussian noise (2020)

    Google Scholar 

  13. Mitchell, J.L.: Introduction to digital audio coding and standards. J. Electron. Imaging 13, 399 (2004)

    Article  Google Scholar 

  14. Mozilla: Project DeepSpeech (2021). https://github.com/mozilla/DeepSpeech

  15. Olivier, R., Raj, B.: Recent improvements of ASR models in the face of adversarial attacks (2022). https://doi.org/10.48550/ARXIV.2203.16536. https://arxiv.org/abs/2203.16536

  16. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964

  17. Park, N., Ji, S., Kim, J.: Detecting audio adversarial examples with logit noising. In: Proceedings of the 37th Annual Computer Security Applications Conference (ACSAC 2021) (2021)

    Google Scholar 

  18. Paul, M.: An adversarial detection model for different data types. Master’s thesis, Technical University of Munich (2021)

    Google Scholar 

  19. Pereira, A., Thomas, C.: Challenges of machine learning applied to safety-critical cyber-physical systems. Mach. Learn. Knowl. Extract. 2, 579–602 (2020)

    Article  Google Scholar 

  20. Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231–5240. PMLR (2019)

    Google Scholar 

  21. Ravanelli, M., et al.: SpeechBrain: a general-purpose speech toolkit (2021). arXiv:2106.04624

  22. Schƶlkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Neural Information Processing Systems 12 (NIPS 1999), vol. 12, pp. 582–588 (1999)

    Google Scholar 

  23. Schƶnherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding (2018)

    Google Scholar 

  24. Sperl, P., Kao, C., Chen, P., Bƶttinger, K.: DLA: dense-layer-analysis for adversarial example detection. CoRR abs/1911.01921 (2019). https://arxiv.org/abs/1911.01921

  25. Subramanian, V., Benetos, E., Sandler, M.B.: Robustness of adversarial attacks in sound event classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE 2019), pp. 239–243 (2019)

    Google Scholar 

  26. Szurley, J., Kolter, J.Z.: Perceptual based adversarial audio attacks (2019)

    Google Scholar 

  27. Wu, H., Li, X., Liu, A.T., Wu, Z., Meng, H., Lee, H.: Adversarial defense for automatic speaker verification by cascaded self-supervised learning models. In: ICASSP 2021 (2021)

    Google Scholar 

  28. Wu, H., Liu, A., Lee, H.: Defense for black-box attacks on anti-spoofing models by self-supervised learning (2020)

    Google Scholar 

  29. Yakura, H., Sakuma, J.: Robust audio adversarial example for a physical attack. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (2018)

    Google Scholar 

  30. Yang, Z., Li, B., Chen, P.Y., Song, D.: Characterizing audio adversarial examples using temporal dependency (2019)

    Google Scholar 

  31. Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: USENIX Security 2018 (2018)

    Google Scholar 

  32. Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W.: DolphinAttack: inaudible voice commands. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)

    Google Scholar 

Download references

Acknowledgment

This research was supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-Yu Kao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Choosaksakunwiboon, S., Pizzi, K., Kao, CY. (2022). Comparing Unsupervised Detection Algorithms for Audio Adversarial Examples. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics