Robustness Tokens: Towards Adversarial Robustness of Transformers

Pulfer, Brian; Belousov, Yury; Voloshynovskiy, Slava

doi:10.1007/978-3-031-73202-7_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15117))

Included in the following conference series:

European Conference on Computer Vision

283 Accesses
2 Altmetric

Abstract

Recently, large pre-trained foundation models have become widely adopted by machine learning practitioners for a multitude of tasks. Given that such models are publicly available, relying on their use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model. In this work, we propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training. We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Understanding Adversarial Robustness of Vision Transformers via Cauchy Problem

Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

References

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer (2021)
Google Scholar
Assran, M., et al.: Masked Siamese networks for label-efficient learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13691, pp. 456–473 Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_26
Assran, M., et al.: Self-supervised learning from images with a joint-embedding predictive architecture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15619–15629 (June 2023)
Google Scholar
Awais, M., et al.: Foundational models defining a new era in vision: a survey and outlook. arXiv preprint arXiv:2307.13721 (2023)
Bahri, Y., Dyer, E., Kaplan, J., Lee, J., Sharma, U.: Explaining neural scaling laws. arXiv preprint arXiv:2102.06701 (2021)
Bai, T., Luo, J., Zhao, J., Wen, B., Wang, Q.: Recent advances in adversarial training for adversarial robustness. arXiv preprint arXiv:2102.01356 (2021)
Balestriero, R., et al.: A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210 (2023)
Ban, Y., Dong, Y.: Pre-trained adversarial perturbations. Adv. Neural Inf. Process. Syst. 35, 1196–1209 (2022)
Google Scholar
Bommasani, R., et al.: On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Chen, X., et al.: Context autoencoder for self-supervised representation learning (2022)
Google Scholar
Croce, F., et al.: Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670 (2020)
Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: International Conference on Machine Learning, pp. 2206–2216. PMLR (2020)
Google Scholar
Darcet, T., Oquab, M., Mairal, J., Bojanowski, P.: Vision transformers need registers (2023)
Google Scholar
Demontis, A., et al.: Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 321–338 (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLORA: efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale (2021)
Google Scholar
El-Nouby, A., et al.: Scalable pre-training of large autoregressive image models (2024)
Google Scholar
Fort, S.: Adversarial examples for the openai clip in its zero-shot classification regime and their semantic generalization, January 2021. https://stanislavfort.github.io/2021/01/12/OpenAI_CLIP_adversarial_examples.html
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gu, A., Dao, T.: Mamba: linear-time sequence modeling with selective state spaces (2023)
Google Scholar
Hatamizadeh, A., Ranzinger, M., Lan, S., Alvarez, J.M., Fidler, S., Kautz, J.: Vir: towards efficient vision retention backbones (2024)
Google Scholar
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (June 2022)
Google Scholar
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper_files/paper/2019/file/a2b15837edac15df90721968986f7f8e-Paper.pdf
Hestness, J., et al.: Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, E.J., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Ilharco, G., et al.: Openclip (2021). https://doi.org/10.5281/zenodo.5143773
Inkawhich, N., McDonald, G., Luley, R.: Adversarial attacks on foundational vision models (2023)
Google Scholar
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International Conference on Machine Learning, pp. 4904–4916. PMLR (2021)
Google Scholar
Jiang, X., Ge, Y., Ge, Y., Yuan, C., Shan, Y.: Supervised fine-tuning in turn improves visual foundation models. arXiv preprint arXiv:2401.10222 (2024)
Kim, H.: Torchattacks: a pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950 (2020)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning, pp. 12888–12900. PMLR (2022)
Google Scholar
Li, L.H., et al.: Grounded language-image pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10975 (2022)
Google Scholar
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Li, Y., Fan, H., Hu, R., Feichtenhofer, C., He, K.: Scaling language-image pre-training via masking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23390–23400 (2023)
Google Scholar
Lian, C., Zhou, H.Y., Yu, Y., Wang, L.: Less could be better: parameter-efficient fine-tuning advances medical vision foundation models. arXiv preprint arXiv:2401.12215 (2024)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks (2019)
Google Scholar
Mann, B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Miller, E.: Attention is off by one (2023). https://www.evanmiller.org/attention-is-off-by-one.html
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)
Google Scholar
Oquab, M., et al.: Dinov2: learning robust visual features without supervision (2023)
Google Scholar
Peng, B., et al.: Rwkv: reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048 (2023)
Peng, Z., Dong, L., Bao, H., Ye, Q., Wei, F.: Beit v2: masked image modeling with vector-quantized visual tokenizers (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision (2021)
Google Scholar
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. In: International Conference on Machine Learning, pp. 28492–28518. PMLR (2023)
Google Scholar
Rando, J., Naimi, N., Baumann, T., Mathys, M.: Exploring adversarial attacks and defenses in vision transformers trained with dino (2022). https://arxiv.org/abs/2206.06761
Schlarmann, C., Hein, M.: On the adversarial robustness of multi-modal foundation models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3677–3685, October 2023
Google Scholar
Sitawarin, C., Chang, J., Huang, D., Altoyan, W., Wagner, D.: Defending against transfer attacks from public models (2023)
Google Scholar
Sun, M., Chen, X., Kolter, J.Z., Liu, Z.: Massive activations in large language models (2024)
Google Scholar
Sun, Y., et al.: Retentive network: a successor to transformer for large language models. arXiv preprint arXiv:2307.08621 (2023)
Touvron, H., Cord, M., Jégou, H.: DeiT III: revenge of the ViT. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13684, pp. 516–533. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_30
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Xiao, G., Tian, Y., Chen, B., Han, S., Lewis, M.: Efficient streaming language models with attention sinks (2023)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Yuan, B., et al.: Decentralized training of foundation models in heterogeneous environments. Adv. Neural Inf. Process. Syst. 35, 25464–25477 (2022)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Zhou, J., et al.: Training and serving system of foundation models: a comprehensive survey. arXiv preprint arXiv:2401.02643 (2024)
Zhou, J., et al.: iBOT: image bert pre-training with online tokenizer (2022)
Google Scholar
Zhu, L., Liao, B., Zhang, Q., Wang, X., Liu, W., Wang, X.: Vision mamba: efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417 (2024)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Geneva, Geneva, Switzerland
Brian Pulfer, Yury Belousov & Slava Voloshynovskiy

Authors

Brian Pulfer
View author publications
You can also search for this author in PubMed Google Scholar
Yury Belousov
View author publications
You can also search for this author in PubMed Google Scholar
Slava Voloshynovskiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brian Pulfer .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3536 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pulfer, B., Belousov, Y., Voloshynovskiy, S. (2025). Robustness Tokens: Towards Adversarial Robustness of Transformers. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15117. Springer, Cham. https://doi.org/10.1007/978-3-031-73202-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-73202-7_7
Published: 21 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73201-0
Online ISBN: 978-3-031-73202-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robustness Tokens: Towards Adversarial Robustness of Transformers