DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection

Wu, Qiaojun; Li, Yuan; Chen, Song; Kang, Yi

doi:10.1007/978-981-19-9297-1_11

Qiaojun Wu⁷,
Yuan Li⁸,
Song Chen^7,8 &
…
Yi Kang^7,8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1744))

Included in the following conference series:

International Conference on Data Mining and Big Data

Abstract

To improve the inference accuracy of neural networks, their size and complexity are growing rapidly, making the deployment of complex task models on mobile devices with efficient inference a major challenge for industry today. Low-precision quantization is one of the key methods to achieve efficient inference on complex networks, but previous works often quantize partial layers because severe accuracy degradation occurs when quantizing is applied to the entire network. In order to improve the stability and accuracy of low-precision quantization-fine-tuning, we propose a hardware-friendly low-precision full quantization method, called DRGS, which dynamically selects rounding mode for weights according to the direction of weight updates during the training forward and scales the corresponding gradient, finally completing the quantization of all layers of the complex network to achieve floating-free-inference. To validate the effectiveness of DRGS, we apply it to RetinaNet with full 4-bit quantization, and the result of the MS-COCO dataset shows that DRGS has a 2.1% improvement in mAP or at least 2X less quantization loss compared to the state of art implementation. This improvement is also significant even on the YOLO, an object detection model family known for run-time low latency and efficiency. In the latest version of YOLO-v5s, the 4-bit fully quantized network reaches mAP 33.4 which to our knowledge is the best mAP achieved at this category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Bhalgat, Y., Lee, J., Nagel, M., Blankevoort, T., Kwak, N.: LSQ+: improving low-bit quantization through learnable offsets and better initialization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 696–697 (2020)
Google Scholar
Chen, P., Liu, J., Zhuang, B., Tan, M., Shen, C.: AQD: towards accurate quantized object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 104–113 (2021)
Google Scholar
Chin, H.H., Tsay, R.S., Wu, H.I.: A high-performance adaptive quantization approach for edge CNN applications. arXiv preprint arXiv:2107.08382 (2021)
Choi, D., Kim, H.: Hardware-friendly log-scale quantization for CNNs with activation functions containing negative values. In: 2021 18th International SoC Design Conference (ISOCC), pp. 415–416. IEEE (2021)
Google Scholar
Gong, R., et al.: Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852–4861 (2019)
Google Scholar
He, Z., Fan, D.: Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11438–11446 (2019)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Jain, S., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 112–128 (2020)
Google Scholar
Kim, J., Bhalgat, Y., Lee, J., Patel, C., Kwak, N.: QKD: quantization-aware knowledge distillation. arXiv preprint arXiv:1911.12491 (2019)
Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2810–2819 (2019)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Nagel, M., Amjad, R.A., Van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? Adaptive rounding for post-training quantization. In: International Conference on Machine Learning, pp. 7197–7206. PMLR (2020)
Google Scholar
Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: 28th Proceedings of Conference on Advances in Neural Information Processing Systems (2015)
Google Scholar
Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 1–39 (2019)
Article Google Scholar
YOLO-v5. https://doi.org/10.5281/zenodo.5563715.. Accessed 16 Dec 2020
Wei, Y., Pan, X., Qin, H., Ouyang, W., Yan, J.: Quantization mimic: towards very tiny CNN for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 267–283 (2018)
Google Scholar
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_23
Chapter Google Scholar
Zhao, X., Wang, Y., Cai, X., Liu, C., Zhang, L.: Linear symmetric quantization of neural networks for low-precision integer hardware. In: ICLR (2020)
Google Scholar
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)

Download references

Acknowledgement

This work was partly supported by the National Key R &D Program of China under grant No. 2019YFB2204800.

Author information

Authors and Affiliations

Institute of Advanced Technology, University of Science and Technology of China, Hefei, China
Qiaojun Wu, Song Chen & Yi Kang
School of Microelectronics, University of Science and Technology of China, Hefei, China
Yuan Li, Song Chen & Yi Kang

Authors

Qiaojun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Song Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Kang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Q., Li, Y., Chen, S., Kang, Y. (2022). DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_11

Download citation

DOI: https://doi.org/10.1007/978-981-19-9297-1_11
Published: 20 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection