Non-uniform Step Size Quantization for Accurate Post-training Quantization

Oh, Sangyun; Sim, Hyeonuk; Kim, Jounghyun; Lee, Jongeun

doi:10.1007/978-3-031-20083-0_39

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

European Conference on Computer Vision

2109 Accesses
2 Citations

Abstract

Quantization is a very effective optimization technique to reduce hardware cost and memory footprint of deep neural network (DNN) accelerators. In particular, post-training quantization (PTQ) is often preferred as it does not require a full dataset or costly retraining. However, performance of PTQ lags significantly behind that of quantization-aware training especially for low-precision networks (\(\le \)4-bit). In this paper we propose a novel PTQ scheme (Code will be publicly available at https://github.com/sogh5/SubsetQ) to bridge the gap, with minimal impact on hardware cost. The main idea of our scheme is to increase arithmetic precision while retaining the same representational precision. The excess arithmetic precision enables us to better match the input data distribution while also presenting a new optimization problem, to which we propose a novel search-based solution. Our scheme is based on logarithmic-scale quantization, which can help reduce hardware cost through the use of shifters instead of multipliers. Our evaluation results using various DNN models on challenging computer vision tasks (image classification, object detection, semantic segmentation) show superior accuracy compared with the state-of-the-art PTQ methods at various low-bit precisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Quantization points are similar to quantization levels but there are some differences. Whereas quantization levels are often integers and may have a different scale than quantization thresholds, quantization points have the same scale as quantization thresholds and can be used as a substitute for them.
2.
One may design a quantizer to output a non-nearest element, which is suboptimal but may be motivated by computational efficiency. An example is log-scale quantization, which was defined [16] as doing a round operation in the logarithmic domain, which is not necessarily the nearest one in the linear domain.
3.
https://github.com/yhhhli/BRECQ.
4.
For InceptionV3 4-bit in Table 3, we only present the result with 8-bit linear quantization because our implementation for low-bit activations [19] did not work properly in this case.
5.
https://github.com/jfzhang95/pytorch-deeplab-xception.
6.
https://github.com/qfgaohao/pytorch-ssd.

References

Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/c0a62e133894cdce435bcb4a5df1db2d-Paper.pdf
Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3291–3300 (2018)
Google Scholar
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. CoRR abs/1606.00915 (2016). arxiv.org/abs/1606.00915
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Google Scholar
Chen, Y., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017). https://doi.org/10.1109/JSSC.2016.2616357
Article Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
Choukroun, Y., Kravchik, E., Yang, F., Kisilev, P.: Low-bit quantization of neural networks for efficient inference. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3009–3018 (2019). https://doi.org/10.1109/ICCVW.2019.00363
Ding, R., Liu, Z., Chin, T.W., Marculescu, D., Blanton, R.D.S.: FlightNNs: lightweight quantized deep neural networks for fast and accurate inference. In: Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3316781.3317828
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2019)
Google Scholar
Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Post-training piecewise linear quantization for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 69–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_5
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: layer-wise calibration and integer programming. CoRR abs/2006.10518 (2020). arxiv.org/abs/2006.10518
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Jung, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)
Google Scholar
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342 (2018)
Lee, E.H., Miyashita, D., Chai, E., Murmann, B., Wong, S.S.: LogNet: energy-efficient neural networks using logarithmic computation. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5900–5904 (2017). https://doi.org/10.1109/ICASSP.2017.7953288
Lee, S., Sim, H., Choi, J., Lee, J.: Successive log quantization for cost-efficient neural networks using stochastic computing. In: 2019 Proceedings of the 56th Annual Design Automation Conference. DAC ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3316781.3317916
Li, Y., Dong, X., Wang, W.: Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. In: International Conference on Learning Representations (2020)
Google Scholar
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021)
Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: IEEE/CVF International Conference on Computer Vision (2019)
Google Scholar
Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? Adaptive rounding for post-training quantization. CoRR abs/2004.10568 (2020). arxiv.org/abs/2004.10568
Nahshan, Y., et al.: Loss aware post-training quantization. CoRR abs/1911.07190 (2019). arxiv.org/abs/1911.07190
Oh, S., Sim, H., Lee, S., Lee, J.: Automated log-scale quantization for low-cost deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition (CVPR), pp. 742–751 (2021)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015). arxiv.org/abs/1512.00567
Wang, P., Chen, Q., He, X., Cheng, J.: Towards accurate post-training network quantization via bit-split and stitching. In: International Conference on Machine Learning (2020)
Google Scholar
Zhao, R., Hu, Y., Dotzel, J., Sa, C.D., Zhang, Z.: Improving neural network quantization without retraining using outlier channel splitting. In: International Conference on Machine Learning (2019)
Google Scholar
Zhao, X., Wang, Y., Cai, X., Liu, C., Zhang, L.: Linear symmetric quantization of neural networks for low-precision integer hardware. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=H1lBj2VFPS

Download references

Acknowledgements

This work was supported by the Samsung Advanced Institute of Technology, Samsung Electronics Co., Ltd., by IITP grants (No. 2020-0-01336, Artificial Intelligence Graduate School Program (UNIST), and No. 1711080972, Neuromorphic Computing Software Platform for Artificial Intelligence Systems) and NRF grant (No. 2020R1A2C2015066) funded by MSIT of Korea, and by Free Innovative Research Fund of UNIST (1.170067.01).

Author information

Authors and Affiliations

Department of Electrical Engineering, UNIST, Ulsan, Korea
Sangyun Oh & Jongeun Lee
Department of Computer Science and Engineering, UNIST, Ulsan, Korea
Hyeonuk Sim
Artificial Intelligence Graduate School, UNIST, Ulsan, Korea
Jounghyun Kim & Jongeun Lee

Authors

Sangyun Oh
View author publications
You can also search for this author in PubMed Google Scholar
Hyeonuk Sim
View author publications
You can also search for this author in PubMed Google Scholar
Jounghyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jongeun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jongeun Lee .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3132 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oh, S., Sim, H., Kim, J., Lee, J. (2022). Non-uniform Step Size Quantization for Accurate Post-training Quantization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-20083-0_39
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics