Abstract
Nowadays, downloading a pre-trained contrastive learning (CL) encoder for feature extraction has become an emerging trend in computer vision tasks. However, few works pay attention to the security of downstream tasks when the upstream CL encoder is attacked by adversarial examples. In this paper, we propose an adversarial attack against a pre-trained CL encoder, aiming to fool the downstream classification tasks under black-box cases. To this end, we design a feature similarity loss function and optimize it to enlarge the feature difference between clean images and adversarial examples. Since the adversarial example forces the CL encoder to output distorted features at the last layer, it successfully fools the downstream classifiers which are heavily relied on the encoder’s feature output. Experimental results on three typical pre-trained CL models and three downstream classifiers show that our attack has achieved much higher attack success rates than the state-of-the-arts, especially when attacking the linear classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bourzac K.: Bringing big neural networks to self-driving cars, smart phones, and drones. IEEE Spectrum (2016)
Mnih, V., Kavukcuoglu, K.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4037–4058 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649 (2021)
Ban, Y., Dong, Y.: Pre-trained Adversarial Perturbations (2022). arXiv:2210.03372
Dong, X., Luu, A.T., Lin, M., Yan, S., Zhang, H.: How should pre-trained language models be fine-tuned towards adversarial robustness? In: Advances in Neural Information Processing Systems, vol. 34, pp. 4356–4369 (2021)
Jiang, Z., Chen, T., Wang, Z.: Robust pre-training by adversarial contrastive learning. In: Advances in Neural Information Processing Systems, vol. 33, 16199–16210 (2020)
Fan, L., Liu, S., Chen, P.Y., Zhang, G., Gan, C.: When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In: Advances in Neural Information Processing Systems, vol. 34, pp. 21480–21492 (2021)
Yang, Z., Liu, Y.: On robust prefix-tuning for text classification. In: International Conference on Learning Representations (2022)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Naseer, M., Khan, S., Hayat, M., Khan, F.S., Porikli, F.: A self-supervised approach for adversarial robustness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 262–271 (2020)
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International Conference on Machine Learning, pp. 9929–9939 (2020)
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33. pp. 21271–21284 (2020)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021)
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security, pp. 99–112 (2018)
Dong, Y., et al.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Pomponi, J., Scardapane, S., Uncini, A.: Pixel: a fast and effective black-box attack based on rearranging pixels. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2022)
Schwinn, L., Raab, R., Nguyen, A., Zanca, D., Eskofier, B.: Exploring misclassifications of robust neural networks to enhance adversarial attacks.arXiv:2105.10304 (2021)
Zhang, J., et al.: Improving adversarial transferability via neuron attribution-based attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14993–15002 (2022)
Wang, Z., Guo, H., Zhang, Z., Liu, W., Qin, Z., Ren, K.: Feature importance-aware transferable adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7639–7648 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale.arXiv:2010.11929. (2020)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Acknowledgement
This work was partially supported by NFSC No.62072484, Sichuan Science and Technology Program (No. 2022YFG0321, No. 2022NSFSC0916), the Opening Project of Engineering Research Center of Digital Forensics, Ministry of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, C., Peng, A., Zeng, H., Wu, K., Yu, W. (2024). Fooling Downstream Classifiers via Attacking Contrastive Learning Pre-trained Models. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-8148-9_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)