Abstract
The ever-increasing layers and hyper-parameters of deep neural network are continuously growing to generate large-scale network by training huge masses of data. However, it is difficult to deploy deep neural network on resource-constrained edge devices. Network mixed-precision quantization is a challenging way to prune and compress deep neural network models while discovering the optimal bit width for each layer. To solve the big challenge, we therefore propose the dynamic pseudo-mean mixed-precision quantization (DPQ) by introducing two-bit scaling factors to compensate errors of quantization. Furthermore, the activation quantization named random parameters clipping (RPC) is proposed. RPC adopts partial activation quantization to reduce loss of accuracy. Therefore, DPQ can dynamically adjust the bit precision of weight quantization according to the distribution of weights. It results in a quantification scheme with strong robustness compared to previous methods. Extensive experiments demonstrate that DPQ achieves 15.43\(\times\) compression rate of ResNet20 on CIFAR-10 dataset with 0.22% increase in accuracy, and 35.25\(\times\) compression rate of Resnet56 on SVHN dataset with 0.12% increase in accuracy.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code availability
The source code of the relevant algorithms is referenced in the paper. The code for the experimental reproduction will be available from the author.
References
Bengio, Y., Léonard, N., & Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I (Vol. 16, pp. 213–229). Springer.
Chang, S.-E., Li, Y., Sun, M., Jiang, W., Liu, S., Wang, Y., & Lin, X. (2021). RMSMP: A novel deep neural network quantization framework with row-wise mixed schemes and multiple precisions. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5251–5260).
Chang, S.-E., Li, Y., Sun, M., Jiang, W., Shi, R., Lin, X., & Wang, Y. (2020). MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. arXiv preprint arXiv:2009.07460
Choi, D., & Kim, H. (2022). Hardware-friendly logarithmic quantization with mixed-precision for mobilenetv2. In 2022 IEEE 4th international conference on artificial intelligence circuits and systems (AICAS) (pp. 348–351). IEEE.
Choi, J., Wang, Z., Venkataramani, S., Chuang, P. I.-J., Srinivasan, V., & Gopalakrishnan, K. (2018). Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085
Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., & Keutzer, K. (2019). Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 293–302).
Esser, S. K., McKinstry, J. L., Bablani, D., Appuswamy, R., & Modha, D. S. (2019). Learned step size quantization. arXiv preprint arXiv:1902.08153
He, D., Yang, Z., Chen, Y., Zhang, Q., Qin, H., & Wang, Y. (2022). Post-training quantization for cross-platform learned image compression. arXiv preprint arXiv:2202.07513
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., & Soudry, D. (2021). Accurate post training quantization with small calibration sets. In International conference on machine learning (pp. 4466–4475). PMLR.
Jin, Q., Ren, J., Zhuang, R., Hanumante, S., Li, Z., Chen, Z., Wang, Y., Yang, K., & Tulyakov, S. (2022). F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239
Jin, H., Song, Q., & Hu, X. (2019). Auto-Keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1946–1956).
Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9799–9808).
Li, Y., Gong, R., Tan, X., Yang, Y., Hu, P., Zhang, Q., Yu, F., Wang, W., & Gu, S. (2021). Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision (pp. 2736–2744).
Liu, Z., Wang, Y., Han, K., Ma, S., Gao, W.: Instance-aware dynamic neural network quantization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12434–12443) (2022)
Liu, J., Zhuang, B., Chen, P., Shen, C., Cai, J., & Tan, M. (2021). Single-path bit sharing for automatic loss-aware model compression. arXiv preprint arXiv:2101.04935
Martinez, J., Shewakramani, J., Liu, T. W., Bârsan, I. A., Zeng, W., & Urtasun, R. (2021). Permute, quantize, and fine-tune: Efficient compression of neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15699–15708).
Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., Van Baalen, M., & Blankevoort, T. (2021). A white paper on neural network quantization. arXiv preprint arXiv:2106.08295
Shomron, G., Gabbay, F., Kurzum, S., & Weiser, U. (2021). Post-training sparsity-aware quantization. Advances in Neural Information Processing Systems, 34, 17737–17748.
Sun, S., Cheng, Y., Gan, Z., & Liu, J. (2019). Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355
Vemparala, M. R., Fasfous, N., Frickenstein, L., Frickenstein, A., Singh, A., Salihu, D., Unger, C., Nagaraja, N.-S., & Stechele, W. (2021). Hardware-aware mixed-precision neural networks using in-train quantization. In British machine vision conference (BMVC).
Wang, K., Liu, Z., Lin, Y., Lin, J., & Han, S. (2019). HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8612–8620).
Wei, X., Gong, R., Li, Y., Liu, X., & Yu, F. (2022). QDrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740
Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10734–10742).
Yang, H., Duan, L., Chen, Y., & Li, H. (2021). Bsq: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462
Yu, H., Han, Q., Li, J., Shi, J., Cheng, G., & Fan, B. (2020). Search what you want: Barrier penalty NAS for mixed precision quantization. In Computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX (Vol. 16, pp. 1–16). Springer.
Yu, Z., & Shi, Y. (2022). Kernel quantization for efficient network compression. IEEE Access, 10, 4063–4071.
Zhang, Y., Qin, J., Park, D. S., Han, W., Chiu, C.-C., Pang, R., Le, Q. V., & Wu, Y. (2020). Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2010.10504
Zhang, D., Yang, J., Ye, D., & Hua, G. (2018). LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European conference on computer vision (ECCV) (pp. 365–382).
Acknowledgements
The authors would like to thank the anonymous reviewers for their invaluable comments. This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No. CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the sponsors.
Funding
This work was partially funded by the National Natural Science Foundation of China under Grant No. 61975124, Shanghai Natural Science Foundation (20ZR1438500), State Key Laboratory of Computer Architecture (ICT, CAS) under Grant No.CARCHA202111, and Engineering Research Center of Software/Hardware Co-design Technology and Application, Ministry of Education, East China Normal University under Grant No. OP202202.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MC, XY, HX and WQ. The method was proposed by SP, JW and BZ. The first draft of the manuscript was written by SP and JW. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no confict of interest.
Consent to participate
All authors agreed with the content and that all gave explicit consent to submit.
Consent for publication
All authors consent for submission, and publication.
Ethics approval
The manuscript does not be submitted to more than one journal for simultaneous consideration. This work is original and are not published elsewhere in any form or language.
Additional information
Editors: Feida Zhu, Bin Yang, João Gama
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pei, S., Wang, J., Zhang, B. et al. DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network. Mach Learn 113, 4099–4112 (2024). https://doi.org/10.1007/s10994-023-06453-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-023-06453-3