Abstract
To improve the inference accuracy of neural networks, their size and complexity are growing rapidly, making the deployment of complex task models on mobile devices with efficient inference a major challenge for industry today. Low-precision quantization is one of the key methods to achieve efficient inference on complex networks, but previous works often quantize partial layers because severe accuracy degradation occurs when quantizing is applied to the entire network. In order to improve the stability and accuracy of low-precision quantization-fine-tuning, we propose a hardware-friendly low-precision full quantization method, called DRGS, which dynamically selects rounding mode for weights according to the direction of weight updates during the training forward and scales the corresponding gradient, finally completing the quantization of all layers of the complex network to achieve floating-free-inference. To validate the effectiveness of DRGS, we apply it to RetinaNet with full 4-bit quantization, and the result of the MS-COCO dataset shows that DRGS has a 2.1% improvement in mAP or at least 2X less quantization loss compared to the state of art implementation. This improvement is also significant even on the YOLO, an object detection model family known for run-time low latency and efficiency. In the latest version of YOLO-v5s, the 4-bit fully quantized network reaches mAP 33.4 which to our knowledge is the best mAP achieved at this category.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Bhalgat, Y., Lee, J., Nagel, M., Blankevoort, T., Kwak, N.: LSQ+: improving low-bit quantization through learnable offsets and better initialization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 696–697 (2020)
Chen, P., Liu, J., Zhuang, B., Tan, M., Shen, C.: AQD: towards accurate quantized object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 104–113 (2021)
Chin, H.H., Tsay, R.S., Wu, H.I.: A high-performance adaptive quantization approach for edge CNN applications. arXiv preprint arXiv:2107.08382 (2021)
Choi, D., Kim, H.: Hardware-friendly log-scale quantization for CNNs with activation functions containing negative values. In: 2021 18th International SoC Design Conference (ISOCC), pp. 415–416. IEEE (2021)
Gong, R., et al.: Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4852–4861 (2019)
He, Z., Fan, D.: Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11438–11446 (2019)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Jain, S., Gural, A., Wu, M., Dick, C.: Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In: Proceedings of Machine Learning and Systems, vol. 2, pp. 112–128 (2020)
Kim, J., Bhalgat, Y., Lee, J., Patel, C., Kwak, N.: QKD: quantization-aware knowledge distillation. arXiv preprint arXiv:1911.12491 (2019)
Li, R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, R.: Fully quantized network for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2810–2819 (2019)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, T., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Nagel, M., Amjad, R.A., Van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? Adaptive rounding for post-training quantization. In: International Conference on Machine Learning, pp. 7197–7206. PMLR (2020)
Nagel, M., Fournarakis, M., Amjad, R.A., Bondarenko, Y., van Baalen, M., Blankevoort, T.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: 28th Proceedings of Conference on Advances in Neural Information Processing Systems (2015)
Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 1–39 (2019)
YOLO-v5. https://doi.org/10.5281/zenodo.5563715.. Accessed 16 Dec 2020
Wei, Y., Pan, X., Qin, H., Ouyang, W., Yan, J.: Quantization mimic: towards very tiny CNN for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 267–283 (2018)
Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_23
Zhao, X., Wang, Y., Cai, X., Liu, C., Zhang, L.: Linear symmetric quantization of neural networks for low-precision integer hardware. In: ICLR (2020)
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)
Acknowledgement
This work was partly supported by the National Key R &D Program of China under grant No. 2019YFB2204800.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, Q., Li, Y., Chen, S., Kang, Y. (2022). DRGS: Low-Precision Full Quantization of Deep Neural Network with Dynamic Rounding and Gradient Scaling for Object Detection. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)