Fooling Downstream Classifiers via Attacking Contrastive Learning Pre-trained Models

Li, Chenggang; Peng, Anjie; Zeng, Hui; Wu, Kaijun; Yu, Wenxin

doi:10.1007/978-981-99-8148-9_19

Chenggang Li¹⁰,
Anjie Peng^10,11,
Hui Zeng¹⁰,
Kaijun Wu¹² &
…
Wenxin Yu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1966))

Included in the following conference series:

International Conference on Neural Information Processing

428 Accesses

Abstract

Nowadays, downloading a pre-trained contrastive learning (CL) encoder for feature extraction has become an emerging trend in computer vision tasks. However, few works pay attention to the security of downstream tasks when the upstream CL encoder is attacked by adversarial examples. In this paper, we propose an adversarial attack against a pre-trained CL encoder, aiming to fool the downstream classification tasks under black-box cases. To this end, we design a feature similarity loss function and optimize it to enlarge the feature difference between clean images and adversarial examples. Since the adversarial example forces the CL encoder to output distorted features at the last layer, it successfully fools the downstream classifiers which are heavily relied on the encoder’s feature output. Experimental results on three typical pre-trained CL models and three downstream classifiers show that our attack has achieved much higher attack success rates than the state-of-the-arts, especially when attacking the linear classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bourzac K.: Bringing big neural networks to self-driving cars, smart phones, and drones. IEEE Spectrum (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4037–4058 (2020)
Article Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Google Scholar
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649 (2021)
Google Scholar
Ban, Y., Dong, Y.: Pre-trained Adversarial Perturbations (2022). arXiv:2210.03372
Dong, X., Luu, A.T., Lin, M., Yan, S., Zhang, H.: How should pre-trained language models be fine-tuned towards adversarial robustness? In: Advances in Neural Information Processing Systems, vol. 34, pp. 4356–4369 (2021)
Google Scholar
Jiang, Z., Chen, T., Wang, Z.: Robust pre-training by adversarial contrastive learning. In: Advances in Neural Information Processing Systems, vol. 33, 16199–16210 (2020)
Google Scholar
Fan, L., Liu, S., Chen, P.Y., Zhang, G., Gan, C.: When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In: Advances in Neural Information Processing Systems, vol. 34, pp. 21480–21492 (2021)
Google Scholar
Yang, Z., Liu, Y.: On robust prefix-tuning for text classification. In: International Conference on Learning Representations (2022)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Naseer, M., Khan, S., Hayat, M., Khan, F.S., Porikli, F.: A self-supervised approach for adversarial robustness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 262–271 (2020)
Google Scholar
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, pp. 9929–9939 (2020)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33. pp. 21271–21284 (2020)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Google Scholar
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security, pp. 99–112 (2018)
Google Scholar
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Google Scholar
Pomponi, J., Scardapane, S., Uncini, A.: Pixel: a fast and effective black-box attack based on rearranging pixels. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2022)
Google Scholar
Schwinn, L., Raab, R., Nguyen, A., Zanca, D., Eskofier, B.: Exploring misclassifications of robust neural networks to enhance adversarial attacks.arXiv:2105.10304 (2021)
Zhang, J., et al.: Improving adversarial transferability via neuron attribution-based attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14993–15002 (2022)
Google Scholar
Wang, Z., Guo, H., Zhang, Z., Liu, W., Qin, Z., Ren, K.: Feature importance-aware transferable adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7639–7648 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale.arXiv:2010.11929. (2020)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar

Download references

Acknowledgement

This work was partially supported by NFSC No.62072484, Sichuan Science and Technology Program (No. 2022YFG0321, No. 2022NSFSC0916), the Opening Project of Engineering Research Center of Digital Forensics, Ministry of Education.

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, 621000, China
Chenggang Li, Anjie Peng, Hui Zeng & Wenxin Yu
Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing, China
Anjie Peng
Science and Technology On Communication Security Laboratory, Chengdu, 610022, China
Kaijun Wu

Authors

Chenggang Li
View author publications
You can also search for this author in PubMed Google Scholar
Anjie Peng
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Kaijun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anjie Peng .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, C., Peng, A., Zeng, H., Wu, K., Yu, W. (2024). Fooling Downstream Classifiers via Attacking Contrastive Learning Pre-trained Models. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8148-9_19
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics