Skip to main content

DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1744))

Included in the following conference series:

Abstract

To improve the inference accuracy of neural networks, their size and complexity are growing rapidly, making the deployment of complex task models on mobile devices with efficient inference a major challenge for industry today. Low-precision quantization is one of the key methods to achieve efficient inference on complex networks, but previous works often quantize partial layers because severe accuracy degradation occurs when quantizing is applied to the entire network. In order to improve the stability and accuracy of low-precision quantization-fine-tuning, we propose a hardware-friendly low-precision full quantization method, called DRGS, which dynamically selects rounding mode for weights according to the direction of weight updates during the training forward and scales the corresponding gradient, finally completing the quantization of all layers of the complex network to achieve floating-free-inference. To validate the effectiveness of DRGS, we apply it to RetinaNet with full 4-bit quantization, and the result of the MS-COCO dataset shows that DRGS has a 2.1% improvement in mAP or at least 2X less quantization loss compared to the state of art implementation. This improvement is also significant even on the YOLO, an object detection model family known for run-time low latency and efficiency. In the latest version of YOLO-v5s, the 4-bit fully quantized network reaches mAP 33.4 which to our knowledge is the best mAP achieved at this category.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  2. Bhalgat, Y., Lee, J., Nagel, M., Blankevoort, T., Kwak, N.: LSQ+: improving low-bit quantization through learnable offsets and better initialization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 696–697 (2020)

    Google Scholar 

  3. Chen, P., Liu, J., Zhuang, B., Tan, M., Shen, C.: AQD: towards accurate quantized object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 104–113 (2021)

    Google Scholar 

  4. Chin, H.H., Tsay, R.S., Wu, H.I.: A high-performance adaptive quantization approach for edge CNN applications. arXiv preprint arXiv:2107.08382 (2021)

  5. Choi, D., Kim, H.: Hardware-friendly log-scale quantization for CNNs with activation functions containing negative values. In: 2021 18th International SoC Design Conference (ISOCC), pp. 415–416. IEEE (2021)

    Google Scholar 

  6. Gong, R., et al.: Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852–4861 (2019)

    Google Scholar 

  7. He, Z., Fan, D.: Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11438–11446 (2019)

    Google Scholar 

  8. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

    Google Scholar 

  9. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)

    Google Scholar 

  10. Jain, S., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 112–128 (2020)

    Google Scholar 

  11. Kim, J., Bhalgat, Y., Lee, J., Patel, C., Kwak, N.: QKD: quantization-aware knowledge distillation. arXiv preprint arXiv:1911.12491 (2019)

  12. Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2810–2819 (2019)

    Google Scholar 

  13. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  14. Lin, T., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  15. Nagel, M., Amjad, R.A., Van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? Adaptive rounding for post-training quantization. In: International Conference on Machine Learning, pp. 7197–7206. PMLR (2020)

    Google Scholar 

  16. Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021)

  17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: 28th Proceedings of Conference on Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  19. Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 1–39 (2019)

    Article  Google Scholar 

  20. YOLO-v5. https://doi.org/10.5281/zenodo.5563715.. Accessed 16 Dec 2020

  21. Wei, Y., Pan, X., Qin, H., Ouyang, W., Yan, J.: Quantization mimic: towards very tiny CNN for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 267–283 (2018)

    Google Scholar 

  22. Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_23

    Chapter  Google Scholar 

  23. Zhao, X., Wang, Y., Cai, X., Liu, C., Zhang, L.: Linear symmetric quantization of neural networks for low-precision integer hardware. In: ICLR (2020)

    Google Scholar 

  24. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)

Download references

Acknowledgement

This work was partly supported by the National Key R &D Program of China under grant No. 2019YFB2204800.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Kang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Q., Li, Y., Chen, S., Kang, Y. (2022). DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-9297-1_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-9296-4

  • Online ISBN: 978-981-19-9297-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics