BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

Kim, Han-Byul; Park, Eunhyeok; Yoo, Sungjoo

doi:10.1007/978-3-031-19775-8_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13672))

Included in the following conference series:

European Conference on Computer Vision

2562 Accesses
2 Citations

Abstract

In this paper, we propose Branch-wise Activation-clipping Search Quantization (BASQ), which is a novel quantization method for low-bit activation. BASQ optimizes clip value in continuous search space while simultaneously searching L2 decay weight factor for updating clip value in discrete search space. We also propose a novel block structure for low precision that works properly on both MobileNet and ResNet structures with branch-wise searching. We evaluate the proposed methods by quantizing both weights and activations to 4-bit or lower. Contrary to the existing methods which are effective only for redundant networks, e.g., ResNet-18, or highly optimized networks, e.g., MobileNet-v2, our proposed method offers constant competitiveness on both types of networks across low precisions from 2 to 4-bits. Specifically, our 2-bit MobileNet-v2 offers top-1 accuracy of 64.71% on ImageNet, outperforming the existing method by a large margin (2.8%), and our 4-bit MobileNet-v2 gives 71.98% which is comparable to the full-precision accuracy 71.88% while our uniform quantization method offers comparable accuracy of 2-bit ResNet-18 to the state-of-the-art non-uniform quantization method. Source code is on https://github.com/HanByulKim/BASQ.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MXQN:Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks

Article 04 January 2021

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Value-Aware Quantization for Training and Inference of Neural Networks

Notes

1.
In [13], the accuracy is evaluated in a different way. [13] use about 16% of the validation set for architecture selection. The test set is constructed and evaluated using the entire validation set. In order to avoid such a duplicate use of the same data in architecture selection and evaluation, we adopt k-fold evaluation.

References

Bai, H., Cao, M., Huang, P., Shan, J.: BatchQuant: quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Bulat, A., Martinez, B., Tzimiropoulos, G.: Bats: Binary architecture search. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Cai, H., Zhu, L., Han, S.: ProxylessNAS: Direct neural architecture search on target task and hardware. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Chen, P., Liu, J., Zhuang, B., Tan, M., Shen, C.: Towards accurate quantized object detection. In: Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Chen, X., Xie, L., Wu, J., Tian, Q.: Progressive differentiable architecture search: bridging the depth gap between search and evaluation. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Choi, J., Venkataramani, S., Srinivasan, V., Gopalakrishnan, K., Wang, Z., Chuang, P.: Accurate and efficient 2-bit quantized neural networks. Proc. Mach. Learn. Syst. 1, 348–359 (2019)
Google Scholar
Choi, J., Wang, Z., Venkataramani, S., Chuang, P., Srinivasan, V., Gopalakrishnan, K.: Pact: parameterized clipping activation for quantized neural networks. arXiv:1805.06085 (2018)
Chu, X., Zhang, B., Xu, R.: Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. In: International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a largescale hierarchical image database. In: Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Esser, S., McKinstry, J., Bablani, D., Appuswamy, R., Modha, D.: Learned step size quantization. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Gong, R., et al.: Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In: International Conference on Computer Vision (ICCV), vol. 1, pp. 348–359 (2019)
Google Scholar
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Habi, H., Jennings, R., Netzer, A.: HMQ: Hardware friendly mixed precision quantization block for CNNs. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Han, T., Li, D., Liu, J., Tian, L., Shan, Y.: Improving low-precision network quantization via bin regularization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5261–5270 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:1503.02531 (2015)
Howard, A., et al.: Searching for MobileNetV3. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Howard, A., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Jung, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Kim, J., Bhalgat, Y., Lee, J., Patel, C., Kwak, N.: QKD: quantization-aware knowledge distillation. arXiv:1911.12491 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Advances in neural information processing systems (NIPS) 31 (2018)
Google Scholar
Li, Y., Dong, X., Wang, W.: Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Liu, Z., Shen, Z., Li, S., Helwegen, K., Huang, D., Cheng, K.: How do Adam and training strategies help BNNs optimization. In: International Conference on Machine Learning (ICML) (2021)
Google Scholar
Liu, Z., Shen, Z., Savvides, M., Cheng, K.: ReActNet: Towards precise binary neural network with generalized activation functions. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Ma, N., Zhang, X., Zheng, H., Sun, J.: ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In: European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Ma, Y., et al.: OMPQ: Orthogonal mixed precision quantization. arXiv:2109.07865 (2021)
Martinez, B., Yang, J., Bulat, A., Tzimiropoulos, G.: Training binary neural networks with real-to-binary convolutions. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Park, E., Yoo, S.: Profit: a novel training method for sub-4-bit MobileNet models. In: European Conference on Computer Vision (ECCV) (2020)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: European Conference on Computer Vision (ECCV) (2016)
Google Scholar
Real, E., Aggarwal, A., Huang, Y., Le, Q.: Regularized evolution for image classifier architecture search. In: AAAI Conference on Artificial Intelligence 33, 4780–4789 (2019)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobileNet V2: inverted residuals and linear bottlenecks. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Uhlich, S., et al.: Mixed precision DNNs: All you need is a good parametrization. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wang, T., et al.: APQ: Joint search for network architecture, pruning and quantization policy. In: Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of convnets via differentiable neural architecture search. arXiv:1812.00090 (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Xie, S., Zheng, H., Liu, C., Lin, L.: SNAS: stochastic neural architecture search. International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Yamamoto, K.: Learnable companding quantization for accurate low-bit neural networks. In: Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: GreedyNAS: Towards fast one-shot NAS with greedy supernet. In: Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Yu, H., Li, H., Shi, H., Huang, T., Hua, G.: Any-precision deep neural networks. arXiv:1911.07346 (2019)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Zhang, Y., Pan, J., Liu, X., Chen, H., Chen, D., Zhang, Z.: FracBNN: accurate and FPGA-efficient binary neural networks with fractional activations. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 171–182 (2021)
Google Scholar
Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160 (2016)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.: Learning transferable architectures for scalable image recognition. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar

Download references

Acknowledgment

This work was supported by IITP and NRF grants funded by the Korea government (MSIT, 2021-0-00105, NRF-2021M3F3A2A02037893) and Samsung Electronics (Memory Division, SAIT, and SRFC-TC1603-04).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
Han-Byul Kim & Sungjoo Yoo
Neural Processing Research Center (NPRC), Seoul National University, Seoul, Korea
Han-Byul Kim & Sungjoo Yoo
Department of Computer Science and Engineering, POSTECH, Pohang, Korea
Eunhyeok Park
Graduate School of Artificial Intelligence, POSTECH, Pohang, Korea
Eunhyeok Park

Authors

Han-Byul Kim
View author publications
You can also search for this author in PubMed Google Scholar
Eunhyeok Park
View author publications
You can also search for this author in PubMed Google Scholar
Sungjoo Yoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sungjoo Yoo .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 575 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, HB., Park, E., Yoo, S. (2022). BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-19775-8_2
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19774-1
Online ISBN: 978-3-031-19775-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks