Skip to main content

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. Based on these insights, we propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL). Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization on existing post-training quantization (PTQ) and quantization-aware training (QAT) algorithms and enables us to obtain a single weight flexible enough to maintain the output quality under various conditions. We conduct extensive studies on CIFAR and ImageNet datasets and validate the effectiveness of the proposed methods.

S. Park and Y. Jang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Please note that we intentionally use different expressions to distinguish quantization’s truncation and the clamping of full-precision data.

  2. 2.

    Note that the column of mixed-precision results is omitted in Table 1 for brevity.

References

  1. Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T., Welling, M.: Gradient l1 regularization for quantization robustness. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=ryxK0JBtPr

  2. Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018). http://arxiv.org/abs/1810.05723

  3. Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: NeurIPS (2019)

    Google Scholar 

  4. Brock, A., De, S., Smith, S.L.: Characterizing signal propagation to close the performance gap in unnormalized ResNets. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=IX3Nnir2omJ

  5. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13166–13175 (2020)

    Google Scholar 

  6. Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks (2018). https://openreview.net/forum?id=By5ugjyCb

  7. Dong, Z., Yao, Z., Arfeen, D., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ-V2: hessian aware trace-weighted quantization of neural networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18518–18529. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/d77c703536718b95308130ff2e5cf9ee-Paper.pdf

  8. Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: hessian aware quantization of neural networks with mixed-precision. In; 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 293–302 (2019)

    Google Scholar 

  9. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkgO66VKDS

  10. Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193 (2019)

  11. Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=6Tm1mposlrM

  12. Han, T., Li, D., Liu, J., Tian, L., Shan, Y.: Improving low-precision network quantization via bin regularization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5261–5270 (2021)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90

  14. Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with Jacobian regularization (2020). https://openreview.net/forum?id=ryl-RTEYvB

  15. Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)

    Google Scholar 

  16. Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Accurate post training quantization with small calibration sets. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 4466–4475. PMLR (2021). http://proceedings.mlr.press/v139/hubara21a.html

  17. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)

    Google Scholar 

  18. Jang, J., et al.: Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile SoC. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 15–28. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00011

  19. Jouppi, N.P., et al.: Ten lessons from three generations shaped Google’s TPUv4i: industrial product. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 1–14. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00010

  20. Jung, S.H., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4345–4354 (2019)

    Google Scholar 

  21. Kwon, J., Kim, J., Park, H., Choi, I.K.: ASAM: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5905–5914. PMLR (2021). https://proceedings.mlr.press/v139/kwon21b.html

  22. Lee, J.H., Ha, S., Choi, S., Lee, W.J., Lee, S.: Quantization for rapid deployment of deep neural networks (2019). https://openreview.net/forum?id=HkzZBi0cFQ

  23. Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH

  24. Lin, J., Gan, C., Han, S.: Defensive quantization: when efficiency meets robustness. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=ryetZ20ctX

  25. Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1325–1334 (2019)

    Google Scholar 

  26. Nahshan, Y., et al.: Loss aware post-training quantization. arXiv abs/1911.07190 (2021)

    Google Scholar 

  27. Int4 precision for AI inference (2019). https://devblogs.nvidia.com/int4-for-ai-inference/. Accessed 16 Nov 2021

  28. NVIDIA a100 tensor core GPU architecture (2020). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf. Accessed 16 Nov 2021

  29. Park, E., Kim, D., Yoo, S.: Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In: International Symposium on Computer Architecture (ISCA) (2018)

    Google Scholar 

  30. Park, E., Yoo, S.: PROFIT: a novel training method for sub-4-bit MobileNet models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 430–446. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_26

    Chapter  Google Scholar 

  31. Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 608–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_36

    Chapter  Google Scholar 

  32. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  33. Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: International Symposium on Computer Architecture (ISCA) (2018)

    Google Scholar 

  34. Shkolnik, M., et al.: Robust quantization: one model to rule them all. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H.T. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December, pp. 6–12, 2020. Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/3948ead63a9f2944218de038d8934305-Abstract.html

  35. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  36. Snapdragon neural processing engine SDK (2017). https://developer.qualcomm.com/docs/snpe/index.html. Accessed 16 Nov 2021

  37. Song, J., et al.: 7.1 an 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile SoC. In: International Solid-State Circuits Conference (ISSCC) (2019)

    Google Scholar 

  38. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). http://arxiv.org/abs/1409.4842

  39. Tulloch, A., Jia, Y.: High performance ultra-low-precision convolutions on mobile devices. arXiv:1712.02427 (2017)

  40. Tulloch, A., Jia, Y.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  41. Wei, X., Gong, R., Li, Y., Liu, X., Yu, F.: QDrop: randomly dropping quantization for extremely low-bit post-training quantization. In: International Conference on Learning Representations (2022)

    Google Scholar 

  42. Wu, H.: NVIDIA low precision inference on GPU. In: GPU Technology Conference (2019)

    Google Scholar 

  43. Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020). https://arxiv.org/abs/2004.09602

  44. Yao, Z., et al.: HAWQV3: dyadic neural network quantization. In: ICML (2021)

    Google Scholar 

  45. Yu, H., Wen, T., Cheng, G., Sun, J., Han, Q., Shi, J.: Low-bit quantization needs good distribution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 680–681 (2020)

    Google Scholar 

  46. Zhao, R., Hu, Y., Dotzel, J., Sa, C.D., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019. Proceedings of Machine Learning Research, vol. 97, pp. 7543–7552. PMLR (2019). http://proceedings.mlr.press/v97/zhao19c.html

  47. Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv abs/1606.06160 (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported by IITP grant funded by the Korea government (MSIT, No. 2019-0-01906, No. 2021-0-00105, and No. 2021-0-00310), SK Hynix Inc. and Google Asia Pacific.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eunhyeok Park .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 360 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Park, S., Jang, Y., Park, E. (2022). Symmetry Regularization and Saturating Nonlinearity for Robust Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20083-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20082-3

  • Online ISBN: 978-3-031-20083-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics