Abstract
Recent works on automatic speech recognition (ASR) systems have shown that the underlying neural networks are vulnerable to so-called adversarial examples. In order to avoid these attacks, different defense mechanisms have been proposed. Most defense mechanisms discussed so far are based on supervised learning, which requires a lot of resources. In this research, we present and compare various unsupervised learning methods for the detection of audio adversarial examples (including autoencoder, VAE, OCSVM, and isolation forest), requiring no adversarial examples in the training data. Our experimental results show that some of the considered methods successfully defend against a simple adversarial attack, e.g., with isolation forest. Even in a more elaborate attack scenario that considers human psychoacoustics, we still achieve a high detection rate with the cost of slightly increased false positive rate, e.g., with an autoencoder. We expect our detailed analysis to be a helpful baseline for further research in the area of defense methods against audio adversarial examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdullah, H., Warren, K., Bindschaedler, V., Papernot, N., Traynor, P.: SoK: the faults in our ASRs: an overview of attacks against automatic speech recognition and speaker identification systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 730ā747. IEEE (2021)
Akinwande, V., Cintas, C., Speakman, S., Sridharan, S.: Identifying audio adversarial examples via anomalous pattern detection (2020)
Andronic, I., Kürzinger, L., Chavez Rosas, E.R., Rigoll, G., Seeber, B.U.: MP3 compression to diminish adversarial noise in end-to-end speech recognition. In: Karpov, A., Potapova, R. (eds.) SPECOM 2020. LNCS (LNAI), vol. 12335, pp. 22ā34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60276-5_3
Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: LREC 2020 (2020)
Carlini, N., Wagner, D.: Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1ā7. IEEE (2018)
Das, N., et al.: Compression to the rescue: defending from adversarial attacks across modalities. In: KDD Project Showcase (2018)
Hussain, S., Neekhara, P., Dubnov, S., McAuley, J., Koushanfar, F.: WaveGuard: understanding and mitigating audio adversarial examples. In: USENIX Security 2021 (2021)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes (2013). https://doi.org/10.48550/arxiv.1312.6114. https://arxiv.org/abs/1312.6114
Liu, A., Yang, S., Chi, P.H., Hsu, P., Lee, H.: Mockingjay: unsupervised speech representation learning with deep bidirectional transformer encoders. In: ICASSP 2020ā2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020)
Liu, A.T., Li, S.W., Lee, H.: TERA: self-supervised learning of transformer encoder representation for speech. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 2351ā2366 (2021)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413ā422 (2008)
Mendes, E., Hogan, K.: Defending against imperceptible audio adversarial examples using proportional additive Gaussian noise (2020)
Mitchell, J.L.: Introduction to digital audio coding and standards. J. Electron. Imaging 13, 399 (2004)
Mozilla: Project DeepSpeech (2021). https://github.com/mozilla/DeepSpeech
Olivier, R., Raj, B.: Recent improvements of ASR models in the face of adversarial attacks (2022). https://doi.org/10.48550/ARXIV.2203.16536. https://arxiv.org/abs/2203.16536
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206ā5210 (2015). https://doi.org/10.1109/ICASSP.2015.7178964
Park, N., Ji, S., Kim, J.: Detecting audio adversarial examples with logit noising. In: Proceedings of the 37th Annual Computer Security Applications Conference (ACSAC 2021) (2021)
Paul, M.: An adversarial detection model for different data types. Masterās thesis, Technical University of Munich (2021)
Pereira, A., Thomas, C.: Challenges of machine learning applied to safety-critical cyber-physical systems. Mach. Learn. Knowl. Extract. 2, 579ā602 (2020)
Qin, Y., Carlini, N., Cottrell, G., Goodfellow, I., Raffel, C.: Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International Conference on Machine Learning, pp. 5231ā5240. PMLR (2019)
Ravanelli, M., et al.: SpeechBrain: a general-purpose speech toolkit (2021). arXiv:2106.04624
Schƶlkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Neural Information Processing Systems 12 (NIPS 1999), vol. 12, pp. 582ā588 (1999)
Schƶnherr, L., Kohls, K., Zeiler, S., Holz, T., Kolossa, D.: Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding (2018)
Sperl, P., Kao, C., Chen, P., Bƶttinger, K.: DLA: dense-layer-analysis for adversarial example detection. CoRR abs/1911.01921 (2019). https://arxiv.org/abs/1911.01921
Subramanian, V., Benetos, E., Sandler, M.B.: Robustness of adversarial attacks in sound event classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE 2019), pp. 239ā243 (2019)
Szurley, J., Kolter, J.Z.: Perceptual based adversarial audio attacks (2019)
Wu, H., Li, X., Liu, A.T., Wu, Z., Meng, H., Lee, H.: Adversarial defense for automatic speaker verification by cascaded self-supervised learning models. In: ICASSP 2021 (2021)
Wu, H., Liu, A., Lee, H.: Defense for black-box attacks on anti-spoofing models by self-supervised learning (2020)
Yakura, H., Sakuma, J.: Robust audio adversarial example for a physical attack. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (2018)
Yang, Z., Li, B., Chen, P.Y., Song, D.: Characterizing audio adversarial examples using temporal dependency (2019)
Yuan, X., et al.: CommanderSong: a systematic approach for practical adversarial voice recognition. In: USENIX Security 2018 (2018)
Zhang, G., Yan, C., Ji, X., Zhang, T., Zhang, T., Xu, W.: DolphinAttack: inaudible voice commands. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017)
Acknowledgment
This research was supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Choosaksakunwiboon, S., Pizzi, K., Kao, CY. (2022). Comparing Unsupervised Detection Algorithms for Audio Adversarial Examples. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)