Skip to main content

TIA: Token Importance Transferable Attack on Vision Transformers

  • Conference paper
  • First Online:
Information Security and Cryptology (Inscrypt 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14527))

Included in the following conference series:

  • 126 Accesses

Abstract

Vision transformers (ViTs) have witnessed significant progress in the past few years. Recently, the latest research revealed that ViTs are vulnerable to transfer-based attacks, in which attackers can use a local surrogate model to generate adversarial examples, then transfer these malicious examples to attack the target black-box ViT directly. Suffering from the threat of transfer-based attacks, it is challenging to deploy ViTs to security-critical tasks. Therefore, it becomes an exact need to explore the robustness of ViTs against transfer-based attacks. However, existing transfer-based attack methods do not fully consider the unique structure of ViT, and they indiscriminately attack the intermediate outputs token of ViTs, leading to the perturbations being focused on specific model information within the tokens, and further resulting in a limited transferability of the generated adversarial examples. To address the current limitations, we propose Token Importance Attack (TIA), a novel ViTs-oriented transfer-based attack method. Specifically, we introduce Randomly Shuffle Patches (RSP) strategy to expand the diversity of the input space. By applying RSP, we can generate multiple shuffled images from a single image, allowing us to obtain multiple token gradients. Then TIA ensembles these token gradients of shuffled images as a guide map to focus the perturbation on the model-independent information in the token rather than model-specific information. Benefiting from these two components, TIA can avoid overfitting to the surrogate model, thus enhancing the transferability of the crafted adversarial examples. Extensive experiments conducted on common datasets with different ViTs and CNNs have demonstrated the effectiveness of TIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  2. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26 (2017)

    Google Scholar 

  3. Chen, Z., Xie, L., Niu, J., Liu, X., Wei, L., Tian, Q.: Visformer: the vision-friendly transformer. In: ICCV (2021)

    Google Scholar 

  4. d’Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: ConViT: improving vision transformers with soft convolutional inductive biases. In: ICML (2021)

    Google Scholar 

  5. Dong, Y., et al.: Boosting adversarial attacks with momentum. In: CVPR (2018)

    Google Scholar 

  6. Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: CVPR (2019)

    Google Scholar 

  7. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  8. Gao, L., Zhang, Q., Song, J., Liu, X., Shen, H.T.: Patch-wise attack for fooling deep neural network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 307–322. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_19

    Chapter  Google Scholar 

  9. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)

    Google Scholar 

  10. Graham, B., et al.: LeViT: a vision transformer in convnet’s clothing for faster inference. In: ICCV (2021)

    Google Scholar 

  11. Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. In: NIPS (2021)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)

    Google Scholar 

  14. Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001)

    Article  Google Scholar 

  15. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)

    Google Scholar 

  16. Li, Y., Li, Y., Xu, K., Yan, Q., Deng, R.H.: Empirical study of face authentication systems under OSNFD attacks. IEEE Trans. Dependable Secure Comput. 15(2), 231–245 (2016)

    Article  Google Scholar 

  17. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  18. Liu, R., et al.: DualFlow: generating imperceptible adversarial examples by flow field and normalize flow-based model. Front. Neurorobot. 17, 1129720 (2023)

    Article  Google Scholar 

  19. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  20. Naseer, M., Khan, S.H., Rahman, S., Porikli, F.: Task-generalizable adversarial attack based on perceptual metric. arXiv preprint arXiv:1811.09020 (2018)

  21. Naseer, M., Ranasinghe, K., Khan, S., Khan, F.S., Porikli, F.: On improving adversarial transferability of vision transformers. In: ICLR (2022)

    Google Scholar 

  22. Papadakis, M.A., McPhee, S.J., Rabow, M.C.: Medical Diagnosis & Treatment. Mc Graw Hill, San Francisco, CA, USA (2019)

    Google Scholar 

  23. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  24. Shao, R., Shi, Z., Yi, J., Chen, P.Y., Hsieh, C.J.: On the adversarial robustness of vision transformers. arXiv preprint arXiv:2103.15670 (2021)

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  26. Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)

    Google Scholar 

  27. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  28. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  29. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)

    Google Scholar 

  30. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)

    Google Scholar 

  31. Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with image transformers. In: ICCV (2021)

    Google Scholar 

  32. Wang, Z., Guo, H., Zhang, Z., Liu, W., Qin, Z., Ren, K.: Feature importance-aware transferable adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7639–7648 (2021)

    Google Scholar 

  33. Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., Jiang, Y.: Towards transferable adversarial attacks on vision transformers. In: AAAI (2022)

    Google Scholar 

  34. Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: CVPR, pp. 2730–2739 (2019)

    Google Scholar 

  35. Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)

    Article  Google Scholar 

  36. Zhou, W., et al.: Transferable adversarial perturbations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 471–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_28

    Chapter  Google Scholar 

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202205AF150145.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuanyu Wang or Wei Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, T., Li, F., Zhang, J., Zhu, L., Wang, Y., Zhou, W. (2024). TIA: Token Importance Transferable Attack on Vision Transformers. In: Ge, C., Yung, M. (eds) Information Security and Cryptology. Inscrypt 2023. Lecture Notes in Computer Science, vol 14527. Springer, Singapore. https://doi.org/10.1007/978-981-97-0945-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0945-8_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0944-1

  • Online ISBN: 978-981-97-0945-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics