Abstract
Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. Based on these insights, we propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL). Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization on existing post-training quantization (PTQ) and quantization-aware training (QAT) algorithms and enables us to obtain a single weight flexible enough to maintain the output quality under various conditions. We conduct extensive studies on CIFAR and ImageNet datasets and validate the effectiveness of the proposed methods.
S. Park and Y. Jang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Please note that we intentionally use different expressions to distinguish quantization’s truncation and the clamping of full-precision data.
- 2.
Note that the column of mixed-precision results is omitted in Table 1 for brevity.
References
Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T., Welling, M.: Gradient l1 regularization for quantization robustness. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=ryxK0JBtPr
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018). http://arxiv.org/abs/1810.05723
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: NeurIPS (2019)
Brock, A., De, S., Smith, S.L.: Characterizing signal propagation to close the performance gap in unnormalized ResNets. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=IX3Nnir2omJ
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13166–13175 (2020)
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks (2018). https://openreview.net/forum?id=By5ugjyCb
Dong, Z., Yao, Z., Arfeen, D., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ-V2: hessian aware trace-weighted quantization of neural networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18518–18529. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/d77c703536718b95308130ff2e5cf9ee-Paper.pdf
Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: hessian aware quantization of neural networks with mixed-precision. In; 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 293–302 (2019)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkgO66VKDS
Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193 (2019)
Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=6Tm1mposlrM
Han, T., Li, D., Liu, J., Tian, L., Shan, Y.: Improving low-precision network quantization via bin regularization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5261–5270 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with Jacobian regularization (2020). https://openreview.net/forum?id=ryl-RTEYvB
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Accurate post training quantization with small calibration sets. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 4466–4475. PMLR (2021). http://proceedings.mlr.press/v139/hubara21a.html
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Jang, J., et al.: Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile SoC. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 15–28. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00011
Jouppi, N.P., et al.: Ten lessons from three generations shaped Google’s TPUv4i: industrial product. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 1–14. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00010
Jung, S.H., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4345–4354 (2019)
Kwon, J., Kim, J., Park, H., Choi, I.K.: ASAM: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5905–5914. PMLR (2021). https://proceedings.mlr.press/v139/kwon21b.html
Lee, J.H., Ha, S., Choi, S., Lee, W.J., Lee, S.: Quantization for rapid deployment of deep neural networks (2019). https://openreview.net/forum?id=HkzZBi0cFQ
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH
Lin, J., Gan, C., Han, S.: Defensive quantization: when efficiency meets robustness. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=ryetZ20ctX
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1325–1334 (2019)
Nahshan, Y., et al.: Loss aware post-training quantization. arXiv abs/1911.07190 (2021)
Int4 precision for AI inference (2019). https://devblogs.nvidia.com/int4-for-ai-inference/. Accessed 16 Nov 2021
NVIDIA a100 tensor core GPU architecture (2020). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf. Accessed 16 Nov 2021
Park, E., Kim, D., Yoo, S.: Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In: International Symposium on Computer Architecture (ISCA) (2018)
Park, E., Yoo, S.: PROFIT: a novel training method for sub-4-bit MobileNet models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 430–446. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_26
Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 608–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_36
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: International Symposium on Computer Architecture (ISCA) (2018)
Shkolnik, M., et al.: Robust quantization: one model to rule them all. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H.T. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December, pp. 6–12, 2020. Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/3948ead63a9f2944218de038d8934305-Abstract.html
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Snapdragon neural processing engine SDK (2017). https://developer.qualcomm.com/docs/snpe/index.html. Accessed 16 Nov 2021
Song, J., et al.: 7.1 an 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile SoC. In: International Solid-State Circuits Conference (ISSCC) (2019)
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). http://arxiv.org/abs/1409.4842
Tulloch, A., Jia, Y.: High performance ultra-low-precision convolutions on mobile devices. arXiv:1712.02427 (2017)
Tulloch, A., Jia, Y.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Wei, X., Gong, R., Li, Y., Liu, X., Yu, F.: QDrop: randomly dropping quantization for extremely low-bit post-training quantization. In: International Conference on Learning Representations (2022)
Wu, H.: NVIDIA low precision inference on GPU. In: GPU Technology Conference (2019)
Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020). https://arxiv.org/abs/2004.09602
Yao, Z., et al.: HAWQV3: dyadic neural network quantization. In: ICML (2021)
Yu, H., Wen, T., Cheng, G., Sun, J., Han, Q., Shi, J.: Low-bit quantization needs good distribution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 680–681 (2020)
Zhao, R., Hu, Y., Dotzel, J., Sa, C.D., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019. Proceedings of Machine Learning Research, vol. 97, pp. 7543–7552. PMLR (2019). http://proceedings.mlr.press/v97/zhao19c.html
Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv abs/1606.06160 (2016)
Acknowledgements
This work was supported by IITP grant funded by the Korea government (MSIT, No. 2019-0-01906, No. 2021-0-00105, and No. 2021-0-00310), SK Hynix Inc. and Google Asia Pacific.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Park, S., Jang, Y., Park, E. (2022). Symmetry Regularization and Saturating Nonlinearity for Robust Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-20083-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)