TIA: Token Importance Transferable Attack on Vision Transformers

Fu, Tingchao; Li, Fanxiao; Zhang, Jinhong; Zhu, Liang; Wang, Yuanyu; Zhou, Wei

doi:10.1007/978-981-97-0945-8_6

Tingchao Fu⁹,
Fanxiao Li⁹,
Jinhong Zhang¹⁰,
Liang Zhu¹¹,
Yuanyu Wang¹¹ &
…
Wei Zhou¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14527))

Included in the following conference series:

International Conference on Information Security and Cryptology

126 Accesses

Abstract

Vision transformers (ViTs) have witnessed significant progress in the past few years. Recently, the latest research revealed that ViTs are vulnerable to transfer-based attacks, in which attackers can use a local surrogate model to generate adversarial examples, then transfer these malicious examples to attack the target black-box ViT directly. Suffering from the threat of transfer-based attacks, it is challenging to deploy ViTs to security-critical tasks. Therefore, it becomes an exact need to explore the robustness of ViTs against transfer-based attacks. However, existing transfer-based attack methods do not fully consider the unique structure of ViT, and they indiscriminately attack the intermediate outputs token of ViTs, leading to the perturbations being focused on specific model information within the tokens, and further resulting in a limited transferability of the generated adversarial examples. To address the current limitations, we propose Token Importance Attack (TIA), a novel ViTs-oriented transfer-based attack method. Specifically, we introduce Randomly Shuffle Patches (RSP) strategy to expand the diversity of the input space. By applying RSP, we can generate multiple shuffled images from a single image, allowing us to obtain multiple token gradients. Then TIA ensembles these token gradients of shuffled images as a guide map to focus the perturbation on the model-independent information in the token rather than model-specific information. Benefiting from these two components, TIA can avoid overfitting to the surrogate model, thus enhancing the transferability of the crafted adversarial examples. Extensive experiments conducted on common datasets with different ViTs and CNNs have demonstrated the effectiveness of TIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26 (2017)
Google Scholar
Chen, Z., Xie, L., Niu, J., Liu, X., Wei, L., Tian, Q.: Visformer: the vision-friendly transformer. In: ICCV (2021)
Google Scholar
d’Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., Sagun, L.: ConViT: improving vision transformers with soft convolutional inductive biases. In: ICML (2021)
Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: CVPR (2018)
Google Scholar
Dong, Y., Pang, T., Su, H., Zhu, J.: Evading defenses to transferable adversarial examples by translation-invariant attacks. In: CVPR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Gao, L., Zhang, Q., Song, J., Liu, X., Shen, H.T.: Patch-wise attack for fooling deep neural network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 307–322. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_19
Chapter Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Google Scholar
Graham, B., et al.: LeViT: a vision transformer in convnet’s clothing for faster inference. In: ICCV (2021)
Google Scholar
Han, K., Xiao, A., Wu, E., Guo, J., Xu, C., Wang, Y.: Transformer in transformer. In: NIPS (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)
Google Scholar
Kononenko, I.: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 23(1), 89–109 (2001)
Article Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)
Google Scholar
Li, Y., Li, Y., Xu, K., Yan, Q., Deng, R.H.: Empirical study of face authentication systems under OSNFD attacks. IEEE Trans. Dependable Secure Comput. 15(2), 231–245 (2016)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Liu, R., et al.: DualFlow: generating imperceptible adversarial examples by flow field and normalize flow-based model. Front. Neurorobot. 17, 1129720 (2023)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Naseer, M., Khan, S.H., Rahman, S., Porikli, F.: Task-generalizable adversarial attack based on perceptual metric. arXiv preprint arXiv:1811.09020 (2018)
Naseer, M., Ranasinghe, K., Khan, S., Khan, F.S., Porikli, F.: On improving adversarial transferability of vision transformers. In: ICLR (2022)
Google Scholar
Papadakis, M.A., McPhee, S.J., Rabow, M.C.: Medical Diagnosis & Treatment. Mc Graw Hill, San Francisco, CA, USA (2019)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Shao, R., Shi, Z., Yi, J., Chen, P.Y., Hsieh, C.J.: On the adversarial robustness of vision transformers. arXiv preprint arXiv:2103.15670 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)
Google Scholar
Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with image transformers. In: ICCV (2021)
Google Scholar
Wang, Z., Guo, H., Zhang, Z., Liu, W., Qin, Z., Ren, K.: Feature importance-aware transferable adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7639–7648 (2021)
Google Scholar
Wei, Z., Chen, J., Goldblum, M., Wu, Z., Goldstein, T., Jiang, Y.: Towards transferable adversarial attacks on vision transformers. In: AAAI (2022)
Google Scholar
Xie, C., et al.: Improving transferability of adversarial examples with input diversity. In: CVPR, pp. 2730–2739 (2019)
Google Scholar
Yurtsever, E., Lambert, J., Carballo, A., Takeda, K.: A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020)
Article Google Scholar
Zhou, W., et al.: Transferable adversarial perturbations. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 471–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_28
Chapter Google Scholar

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202205AF150145.

Author information

Authors and Affiliations

Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
Tingchao Fu & Fanxiao Li
School of Information Science and Engineering, Yunnan University, Kunming, China
Jinhong Zhang
Kunming Institute of Physics, Kunming, China
Liang Zhu & Yuanyu Wang
National Pilot School of Software, Yunnan University, Kunming, China
Wei Zhou

Authors

Tingchao Fu
View author publications
You can also search for this author in PubMed Google Scholar
Fanxiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yuanyu Wang or Wei Zhou .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Chunpeng Ge
Columbia University, New York, NY, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, T., Li, F., Zhang, J., Zhu, L., Wang, Y., Zhou, W. (2024). TIA: Token Importance Transferable Attack on Vision Transformers. In: Ge, C., Yung, M. (eds) Information Security and Cryptology. Inscrypt 2023. Lecture Notes in Computer Science, vol 14527. Springer, Singapore. https://doi.org/10.1007/978-981-97-0945-8_6

Download citation

DOI: https://doi.org/10.1007/978-981-97-0945-8_6
Published: 25 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0944-1
Online ISBN: 978-981-97-0945-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TIA: Token Importance Transferable Attack on Vision Transformers