Pse: mixed quantization framework of neural networks for efficient deployment

Yang, Yingqing; Tian, Guanzhong; Liu, Mingyuan; Chen, Yihao; Chen, Jun; Liu, Yong; Pan, Yu; Ma, Longhua

doi:10.1007/s11554-023-01366-9

Pse: mixed quantization framework of neural networks for efficient deployment

Research
Published: 11 October 2023

Volume 20, article number 113, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Yingqing Yang¹,
Guanzhong Tian¹,
Mingyuan Liu²,
Yihao Chen¹,
Jun Chen³,
Yong Liu³,
Yu Pan⁴ &
…
Longhua Ma⁵

196 Accesses
Explore all metrics

Abstract

Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8\(\times\) acceleration ratio and 30.4\(\times\) compression ratio with less than 1.54% accuracy loss on CIFAR-10.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Greedy search algorithm for partial quantization of convolutional neural networks inspired by submodular optimization

Article 13 January 2022

References

Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: KDD ’06 (2006)
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13169–13178 (2020)
Chen, J., Bai, S., Huang, T., Wang, M., Tian, G., Liu, Y.: Data-free quantization via mixed-precision compensation without fine-tuning. Pattern Recognit. 143, 109780 (2023)
Article Google Scholar
Chen, J., Liu, L., Liu, Y., Zeng, X.: A learning framework for n-bit quantized neural networks toward fpgas. IEEE Trans. Neural Netw. Learn. Syst. 32(3), 1067–1081 (2021). https://doi.org/10.1109/TNNLS.2020.2980041
Article MathSciNet Google Scholar
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. in: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (2014)
Codrescu, L., Anderson, W., Venkumanhanti, S., Zeng, M., Plondke, E., Koob, C., Ingle, A., Tabony, C., Maule, R.: Hexagon dsp: an architecture optimized for mobile multimedia and communications. IEEE Micro 34(2), 34–43 (2014). https://doi.org/10.1109/MM.2014.12
Article Google Scholar
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or - 1. arXiv preprint arXiv:1602.02830 (2016)
Dettmers, T.: 8-bit approximations for parallelism in deep learning. arXiv preprint arXiv:1511.04561 (2015)
Fan, A., Stock, P., Graham, B., Grave, E., Gribonval, R., Jegou, H., Joulin, A.: Training with quantization noise for extreme model compression. arXiv preprint arXiv:2004.07320 (2020)
Fan, J., Pan, Z., Wang, L., Wang, Y.: Codebook-softened product quantization for high accuracy approximate nearest neighbor search. Neurocomputing 507, 107–116 (2022)
Article Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on machine Learning, pp. 1050–1059. PMLR (2016)
Google Scholar
Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Bach, F., Blei, D. (eds.) Rroceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 1737–1746. PMLR, Lille (2015)
Google Scholar
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M., Dally, W.J.: Eie: Efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) pp. 243–254 (2016)
Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. Vision and Pattern Recognition. arXiv: Computer (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. ArXiv abs/1503.02531 (2015)
Hong, W., Chen, T., Lu, M., Pu, S., Ma, Z.: Efficient neural image decoding via fixed-point inference. IEEE Trans. Circuits Syst. Video Technol. 31(9), 3618–3630 (2021). https://doi.org/10.1109/TCSVT.2020.3040367
Article Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hu, B., Zhou, S., Xiong, Z., Wu, F.: Cross-resolution distillation for efficient 3D medical image registration. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7269–7283 (2022). https://doi.org/10.1109/TCSVT.2022.3178178
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Jin, Q., Ren, J., Zhuang, R., Hanumante, S., Li, Z., Chen, Z., Wang, Y., Yang, K., Tulyakov, S.: F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239 (2022)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011). https://doi.org/10.1109/TPAMI.2010.57
Article Google Scholar
Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper. arXiv preprint arXiv:1806.08342 (2018)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. (2009)
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, Z., Sun, Y., Tian, G., Xie, L., Liu, Y., Su, H., He, Y.: A compression pipeline for one-stage object detection model. J. Real-Time Image Process. 18, 1949–1962 (2021)
Article Google Scholar
Liang, T., Glossner, J., Wang, L., Shi, S., Zhang, X.: Pruning and quantization for deep neural network acceleration: a survey. Neurocomputing 461, 370–403 (2021)
Article Google Scholar
Lin, J., Chen, W.M., Lin, Y., Cohn, J., Gan, C., Han, S.: Mcunet: tiny deep learning on IoT devices. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 11711–11722. Curran Associates Inc (2020)
Google Scholar
Liu, C., Ding, W., Chen, P., Zhuang, B., Wang, Y., Zhao, Y., Zhang, B., Han, Y.: Rb-net: training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation. IEEE Trans. Circuits Syst. Video Technol. 32(9), 6414–6424 (2022). https://doi.org/10.1109/TCSVT.2022.3166803
Article Google Scholar
Liu, Y., Wu, D., Zhou, W., Fan, K., Zhou, Z.: Eacp: an effective automatic channel pruning for neural networks. Neurocomputing 526, 131–142 (2023)
Article Google Scholar
Liu, Y., Wu, D., Zhou, W., Fan, K., Zhou, Z.: Eacp: an effective automatic channel pruning for neural networks. Neurocomputing 526, 131–142 (2023). https://doi.org/10.1016/j.neucom.2023.01.014
Article Google Scholar
Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.: Bi-real net: enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: ECCV (2018)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Nguyen, D.T., Kim, H., Lee, H.J.: Layer-specific optimization for mixed data flow with mixed precision in FPGA design for CNN-based object detectors. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2450–2464 (2021). https://doi.org/10.1109/TCSVT.2020.3020569
Article Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural. Inf. Process. Syst. 32, 8026–8037 (2019)
Google Scholar
Patel, G., Mopuri, K.R., Qiu, Q.: Learning to retain while acquiring: combating distribution-shift in adversarial data-free knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7786–7794 (2023)
Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., Wang, Y., Yang, H.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542. Springer (2016)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Sayed, R., Azmi, H., Shawkey, H., Khalil, A.H., Refky, M.: A systematic literature review on binary neural networks. IEEE Access 11, 27546–27578 (2023). https://doi.org/10.1109/ACCESS.2023.3258360
Article Google Scholar
Shang, Y., Xu, D., Zong, Z., Nie, L., Yan, Y.: Network binarization via contrastive learning. In: European Conference on Computer Vision, pp. 586–602. Springer (2022)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Stock, P., Joulin, A., Gribonval, R., Graham, B., Jégou, H.: And the bit goes down: Revisiting the quantization of neural networks. arXiv preprint arXiv:1907.05686 (2019)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Tian, G., Chen, J., Zeng, X., Liu, Y.: Pruning by training: a novel deep neural network compression framework for image processing. IEEE Signal Process. Lett. 28, 344–348 (2021)
Article Google Scholar
Tu, Z., Chen, X., Ren, P., Wang, Y.: Adabin: improving binary neural networks with adaptive binary sets. In: European conference on computer vision, pp. 379–395. Springer (2022)
Google Scholar
Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on cpus (2011)
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: Haq: Hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Wang, Z., Xiao, H., Lu, J., Zhou, J.: Generalizable mixed-precision quantization via attribution rank preservation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5291–5300 (2021)
Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)
Xu, Z., Lin, M., Liu, J., Chen, J., Shao, L., Gao, Y., Tian, Y., Ji, R.: Recu: Reviving the dead weights in binary neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5198–5208 (2021)
Yang, C., Liu, H.: Channel pruning based on convolutional neural network sensitivity. Neurocomputing 507, 97–106 (2022)
Article Google Scholar
Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: Learned quantization for highly accurate and compact deep neural networks. ArXiv abs/1807.10029 (2018)
Zhang, J., Su, Z., Feng, Y., Lu, X., Pietikäinen, M., Liu, L.: Dynamic binary neural network by learning channel-wise thresholds. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1885–1889. IEEE (2022)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNS with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)
Zhuang, B., Shen, C., Tan, M., Liu, L., Reid, I.D.: Towards effective low-bitwidth convolutional neural networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 7920–7928 (2018)

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant 62303405, in part by Ningbo Natural Science Foundation project under Grant 2023J400, and in part by Zhejiang Provincial Basic Public Welfare Research Project of China under Grant No. LGG22F030019.

Author information

Authors and Affiliations

Ningbo Innovation Center, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Yingqing Yang, Guanzhong Tian & Yihao Chen
Alibaba Group, Chaoyang District, Beijing, 100006, China
Mingyuan Liu
State Key Laboratory of Industrial Control Technology, Institute of CyberSystems and Control, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Jun Chen & Yong Liu
State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Yu Pan
School of Information Science and Engineering, NingboTech University, Ningbo, 315100, Zhejiang, China
Longhua Ma

Authors

Yingqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guanzhong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Mingyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yihao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Longhua Ma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YY: Conceptualization, Software, Validation, Formal analysis, Data Curation, Writing - Original Draft, Writing-Review and Editing, Visualization. GT: Resources, Supervision, Project administration, Funding acquisition. ML: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Project administration. YC: Writing-Review and Editing. JC: Conceptualization, Writing-Review and Editing. LM: Resources, Funding acquisition. YL: Resources, Funding acquisition. YP: Resources, Funding acquisition. All authors reviewed the manuscript.

Corresponding author

Correspondence to Guanzhong Tian.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Y., Tian, G., Liu, M. et al. Pse: mixed quantization framework of neural networks for efficient deployment. J Real-Time Image Proc 20, 113 (2023). https://doi.org/10.1007/s11554-023-01366-9

Download citation

Received: 08 June 2023
Accepted: 13 September 2023
Published: 11 October 2023
DOI: https://doi.org/10.1007/s11554-023-01366-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pse: mixed quantization framework of neural networks for efficient deployment

Abstract

Access this article

Similar content being viewed by others

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Greedy search algorithm for partial quantization of convolutional neural networks inspired by submodular optimization

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pse: mixed quantization framework of neural networks for efficient deployment

Abstract

Access this article

Similar content being viewed by others

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

BiTAT: Neural Network Binarization with Task-Dependent Aggregated Transformation

Greedy search algorithm for partial quantization of convolutional neural networks inspired by submodular optimization

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation