Symmetry Regularization and Saturating Nonlinearity for Robust Quantization

Park, Sein; Jang, Yeongsang; Park, Eunhyeok

doi:10.1007/978-3-031-20083-0_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

European Conference on Computer Vision

2738 Accesses

Abstract

Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. Based on these insights, we propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL). Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization on existing post-training quantization (PTQ) and quantization-aware training (QAT) algorithms and enables us to obtain a single weight flexible enough to maintain the output quality under various conditions. We conduct extensive studies on CIFAR and ImageNet datasets and validate the effectiveness of the proposed methods.

S. Park and Y. Jang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Adaptive Rounding Compensation for Post-training Quantization

IQNN: Training Quantized Neural Networks with Iterative Optimizations

Notes

1.
Please note that we intentionally use different expressions to distinguish quantization’s truncation and the clamping of full-precision data.
2.
Note that the column of mixed-precision results is omitted in Table 1 for brevity.

References

Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T., Welling, M.: Gradient l1 regularization for quantization robustness. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=ryxK0JBtPr
Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: ACIQ: analytical clipping for integer quantization of neural networks. CoRR abs/1810.05723 (2018). http://arxiv.org/abs/1810.05723
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: NeurIPS (2019)
Google Scholar
Brock, A., De, S., Smith, S.L.: Characterizing signal propagation to close the performance gap in unnormalized ResNets. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=IX3Nnir2omJ
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13166–13175 (2020)
Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks (2018). https://openreview.net/forum?id=By5ugjyCb
Dong, Z., Yao, Z., Arfeen, D., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ-V2: hessian aware trace-weighted quantization of neural networks. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 18518–18529. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/d77c703536718b95308130ff2e5cf9ee-Paper.pdf
Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: hessian aware quantization of neural networks with mixed-precision. In; 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 293–302 (2019)
Google Scholar
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkgO66VKDS
Finkelstein, A., Almog, U., Grobman, M.: Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193 (2019)
Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=6Tm1mposlrM
Han, T., Li, D., Liu, J., Tian, L., Shan, Y.: Improving low-precision network quantization via bin regularization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5261–5270 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Hoffman, J., Roberts, D.A., Yaida, S.: Robust learning with Jacobian regularization (2020). https://openreview.net/forum?id=ryl-RTEYvB
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Accurate post training quantization with small calibration sets. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 4466–4475. PMLR (2021). http://proceedings.mlr.press/v139/hubara21a.html
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Jang, J., et al.: Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile SoC. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 15–28. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00011
Jouppi, N.P., et al.: Ten lessons from three generations shaped Google’s TPUv4i: industrial product. In: 48th ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2021, Valencia, Spain, 14–18 June 2021, pp. 1–14. IEEE (2021). https://doi.org/10.1109/ISCA52012.2021.00010
Jung, S.H., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4345–4354 (2019)
Google Scholar
Kwon, J., Kim, J., Park, H., Choi, I.K.: ASAM: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5905–5914. PMLR (2021). https://proceedings.mlr.press/v139/kwon21b.html
Lee, J.H., Ha, S., Choi, S., Lee, W.J., Lee, S.: Quantization for rapid deployment of deep neural networks (2019). https://openreview.net/forum?id=HkzZBi0cFQ
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH
Lin, J., Gan, C., Han, S.: Defensive quantization: when efficiency meets robustness. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=ryetZ20ctX
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1325–1334 (2019)
Google Scholar
Nahshan, Y., et al.: Loss aware post-training quantization. arXiv abs/1911.07190 (2021)
Google Scholar
Int4 precision for AI inference (2019). https://devblogs.nvidia.com/int4-for-ai-inference/. Accessed 16 Nov 2021
NVIDIA a100 tensor core GPU architecture (2020). https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf. Accessed 16 Nov 2021
Park, E., Kim, D., Yoo, S.: Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In: International Symposium on Computer Architecture (ISCA) (2018)
Google Scholar
Park, E., Yoo, S.: PROFIT: a novel training method for sub-4-bit MobileNet models. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 430–446. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_26
Chapter Google Scholar
Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 608–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_36
Chapter Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks. In: International Symposium on Computer Architecture (ISCA) (2018)
Google Scholar
Shkolnik, M., et al.: Robust quantization: one model to rule them all. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H.T. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December, pp. 6–12, 2020. Virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/3948ead63a9f2944218de038d8934305-Abstract.html
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Snapdragon neural processing engine SDK (2017). https://developer.qualcomm.com/docs/snpe/index.html. Accessed 16 Nov 2021
Song, J., et al.: 7.1 an 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile SoC. In: International Solid-State Circuits Conference (ISSCC) (2019)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015). http://arxiv.org/abs/1409.4842
Tulloch, A., Jia, Y.: High performance ultra-low-precision convolutions on mobile devices. arXiv:1712.02427 (2017)
Tulloch, A., Jia, Y.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Wei, X., Gong, R., Li, Y., Liu, X., Yu, F.: QDrop: randomly dropping quantization for extremely low-bit post-training quantization. In: International Conference on Learning Representations (2022)
Google Scholar
Wu, H.: NVIDIA low precision inference on GPU. In: GPU Technology Conference (2019)
Google Scholar
Wu, H., Judd, P., Zhang, X., Isaev, M., Micikevicius, P.: Integer quantization for deep learning inference: principles and empirical evaluation. CoRR abs/2004.09602 (2020). https://arxiv.org/abs/2004.09602
Yao, Z., et al.: HAWQV3: dyadic neural network quantization. In: ICML (2021)
Google Scholar
Yu, H., Wen, T., Cheng, G., Sun, J., Han, Q., Shi, J.: Low-bit quantization needs good distribution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 680–681 (2020)
Google Scholar
Zhao, R., Hu, Y., Dotzel, J., Sa, C.D., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, USA, 9–15 June 2019. Proceedings of Machine Learning Research, vol. 97, pp. 7543–7552. PMLR (2019). http://proceedings.mlr.press/v97/zhao19c.html
Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv abs/1606.06160 (2016)
Google Scholar

Download references

Acknowledgements

This work was supported by IITP grant funded by the Korea government (MSIT, No. 2019-0-01906, No. 2021-0-00105, and No. 2021-0-00310), SK Hynix Inc. and Google Asia Pacific.

Author information

Authors and Affiliations

Graduate School of Artificial Intelligence, POSTECH, Pohang, Korea
Sein Park & Eunhyeok Park
Department of Computer Science and Engineering, POSTECH, Pohang, Korea
Yeongsang Jang & Eunhyeok Park

Authors

Sein Park
View author publications
You can also search for this author in PubMed Google Scholar
Yeongsang Jang
View author publications
You can also search for this author in PubMed Google Scholar
Eunhyeok Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eunhyeok Park .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 360 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, S., Jang, Y., Park, E. (2022). Symmetry Regularization and Saturating Nonlinearity for Robust Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-20083-0_13
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization