Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

Yu, Haibao; Han, Qi; Li, Jianbo; Shi, Jianping; Cheng, Guangliang; Fan, Bin

doi:10.1007/978-3-030-58545-7_1

Haibao Yu¹²,
Qi Han¹²,
Jianbo Li^12,13,
Jianping Shi¹²,
Guangliang Cheng¹² &
…
Bin Fan¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12354))

Included in the following conference series:

European Conference on Computer Vision

5515 Accesses
32 Citations

Abstract

Emergent hardwares can support mixed precision CNN models inference that assign different bitwidths for different layers. Learning to find an optimal mixed precision model that can preserve accuracy and satisfy the specific constraints on model size and computation is extremely challenge due to the difficult in training a mixed precision model and the huge space of all possible bit quantizations.

In this paper, we propose a novel soft Barrier Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all the searched models are inside the valid domain defined by the complexity constraint, thus could return an optimal model under the given constraint by conducting search only one time. The proposed soft Barrier Penalty is differentiable and can impose very large losses to those models outside the valid domain while almost no punishment for models inside the valid domain, thus constraining the search only in the feasible domain. In addition, a differentiable Prob-1 regularizer is proposed to ensure learning with NAS is reasonable. A distribution reshaping training strategy is also used to make training more stable. BP-NAS sets new state of the arts on both classification (Cifar-10, ImageNet) and detection (COCO), surpassing all the efficient mixed precision methods designed manually and automatically. Particularly, BP-NAS achieves higher mAP (up to 2.7% mAP improvement) together with lower bit computation cost compared with the existing best mixed precision model on COCO detection.

H. Yu and Q. Han—Indicates equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance

Non-uniform Step Size Quantization for Accurate Post-training Quantization

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

Notes

1.
We also provide an example to demonstrate this BOPs calculation process in supplementary materials. More formulation of BOPs could be seen in [4, 23], which consider the memory bandwidth and could be more suitable for real application.
2.
2.43MP uses the mixed precision quantizations searched by HAWQ [7].

References

Alizadeh, F.: Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5(1), 13–51 (1995)
Article MathSciNet Google Scholar
Cai, H., Zhu, L., Han, S.: ProxyLessnas: direct neural architecture search on target task and hardware. In: ICLR (2019)
Google Scholar
Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave Gaussian quantization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Chaim, B., Eli, S., Evgenii, Z., Natan, L., Raja, G., Alex, M.B., Avi, M.: UNIQ: uniform noise injection for non-uniform quantization of neural networks. arXiv preprint arXiv:1804.10969 (2018)
Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.J., Srinivasan, V., Gopalakrishnan, K.: PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dong, Z., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: HAWQ: hessian aware quantization of neural networks with mixed-precision. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Gu, J., Zhao, J., Jiang, X., Zhang, B., Liu, J., Guo, G., Ji, R.: Bayesian optimized 1-bit CNNs. In: ICCV (2019)
Google Scholar
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: ECCV (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2016)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. arXiv preprint arXiv:1712.05877 (2017)
Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection (2019)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Moran, S., et al.: Robust quantization: one model to rule them all. ArXiv, abs/2002.07686 (2020)
Google Scholar
Nvidia: Nvidia tensor cores (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wang, P., et al.: Two-step quantization for low-bit neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4376–4384 (2018)
Google Scholar
Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)
Google Scholar
Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of ConvNets via differentiable neural architecture search. In: ICLR (2019)
Google Scholar
Yochai, Z., et al.: Towards learning of filter-level heterogeneous compression of convolutional neural networks. In: ICML Workshop on AutoML (2019)
Google Scholar
Yu, H., Wen, T., Cheng, G., Sun, J., Han, Q., Shi, J.: Low-bit quantization needs good distribution. In: CVPR Workshop on Efficient Deep Learning in Computer Vision (2020)
Google Scholar
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 365–382 (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Confernce on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)
Google Scholar
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

Download references

Acknowledgments

We thank Ligeng Zhu for the supportive feedback, and Kuntao Xiao for the meaningful discussion on solving constrained-optimization problem. This work is supported by the National Natural Science Foundation of China (61876180), the Beijing Natural Science Foundation (4202073), the Young Elite Scientists Sponsorship Program by CAST (2018QNRC001).

Author information

Authors and Affiliations

SenseTime Research, Beijing, China
Haibao Yu, Qi Han, Jianbo Li, Jianping Shi & Guangliang Cheng
Peking University, Beijing, China
Jianbo Li
University of Science and Technology Beijing, Beijing, China
Bin Fan

Authors

Haibao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Han
View author publications
You can also search for this author in PubMed Google Scholar
Jianbo Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Shi
View author publications
You can also search for this author in PubMed Google Scholar
Guangliang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Bin Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guangliang Cheng or Bin Fan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 340 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, H., Han, Q., Li, J., Shi, J., Cheng, G., Fan, B. (2020). Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12354. Springer, Cham. https://doi.org/10.1007/978-3-030-58545-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58545-7_1
Published: 05 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58544-0
Online ISBN: 978-3-030-58545-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics