Abstract
Vision Transformers (ViT) have demonstrated outstanding performance in visual tasks. However, deploying and inferring ViT models on resource-constrained edge devices face challenges due to their high computational overhead. Existing quantization methods require access to raw training data, which raises security and privacy concerns. To address this issue, this paper proposes a data-free quantization method named Perturbation-Aware Vision Transformer (PA-ViT), which effectively enhances the robustness of synthetic images, thereby improving the performance of downstream post-training quantization tasks. Specifically, PA-ViT introduces perturbations to the synthetic images, and then models the inconsistency between the attention maps and predicted labels of both perturbed and unperturbed images, as processed by the full-precision (FP) model. A loss function is constructed to guide the generation of robust images. Experimental results on ImageNet demonstrate significant performance improvements compared to existing techniques and even surpass quantization using real data. For instance, PA-ViT with Swin-T as the backbone model achieves a 5.29% and 4.93% improvement in top-1 accuracy compared to the state-of-the-art model when quantized to 4-bit and 8-bit precision, respectively, providing an excellent solution for data-free post-training quantization of vision transformers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bai, J., et al.: Robustness-guided image synthesis for data-free quantization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 10971–10979 (2024)
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135 (2021)
Choi, K., Hong, D., Park, N., Kim, Y., Lee, J.: Qimera: data-free quantization with synthetic boundary supporting samples. Adv. Neural. Inf. Process. Syst. 34, 14835–14847 (2021)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012)
Li, Z., Chen, M., Xiao, J., Gu, Q.: PSAQ-ViT V2: toward accurate and general data-free quantization for vision transformers. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Li, Z., Ma, L., Chen, M., Xiao, J., Gu, Q.: Patch similarity aware data-free quantization for vision transformers. In: European Conference on Computer Vision, pp. 154–170. Springer (2022). https://doi.org/10.1007/978-3-031-20083-0_10
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S., Gao, W.: Post-training quantization for vision transformer. Adv. Neural. Inf. Process. Syst. 34, 28092–28103 (2021)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Xu, S., Li, H., Zhuang, B., Liu, J., Cao, J., Liang, C., Tan, M.: Generative low-bitwidth data free quantization. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pp. 1–17. Springer (2020). https://doi.org/10.1007/978-3-030-58610-2_1
Zhang, Y., et al.: CausalAdv: adversarial robustness through the lens of causality. arXiv preprint arXiv:2106.06196 (2021)
Zhong, Y., et al.: IntraQ: learning synthetic images with intra-class heterogeneity for zero-shot network quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12339–12348 (2022)
Zhu, B., Hofstee, P., Peltenburg, J., Lee, J., Alars, Z.: Autorecon: neural architecture search-based reconstruction for data-free compression. arXiv preprint arXiv:2105.12151 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, Y., Mu, L., Zhuang, J., Liang, X., Ye, J., Hu, H. (2025). Data-Free Quantization of Vision Transformers Through Perturbation-Aware Image Synthesis. In: Hadfi, R., Anthony, P., Sharma, A., Ito, T., Bai, Q. (eds) PRICAI 2024: Trends in Artificial Intelligence. PRICAI 2024. Lecture Notes in Computer Science(), vol 15283. Springer, Singapore. https://doi.org/10.1007/978-981-96-0122-6_32
Download citation
DOI: https://doi.org/10.1007/978-981-96-0122-6_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0121-9
Online ISBN: 978-981-96-0122-6
eBook Packages: Computer ScienceComputer Science (R0)