Skip to main content
Log in

Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Deep neural networks have achieved state-of-the-art performances in wide range scenarios, such as natural language processing, object detection, image classification, speech recognition, etc. While showing impressive results across these machine learning tasks, neural network models still suffer from computational consuming and memory intensive for parameters training/storage on mobile service scenario. As a result, how to simplify models as well as accelerate neural networks are undoubtedly to be crucial research topic. To address this issue, in this paper, we propose “Bit-Quantized-Net”(BQ-Net), which can compress deep neural networks both at the training phase and testing inference. And, the model size can be reduced by compressing bit quantized weights. Specifically, for training or testing plain neural network model, it is running tens of millions of times of y=wx+b computations. In BQ-Net, however, model approximate the computation operation y = wx + b by y = sign(w)(x ≫|w|) + b during forward propagation of neural networks. That is, BQ-Net trains the networks with bit quantized weights during forwarding propagation, while retaining the full precision weights for gradients accumulating during backward propagation. Finally, we apply Huffman coding to encode the bit shifting weights which compressed the model size in some way. Extensive experiments on three real data-sets (MNIST, CIFAR-10, SVHN) show that BQ-Net can achieve 10-14× model compressibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Cheng Y, Wang D, Zhou P, Zhang T (2017) A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282

  2. Lu H, Bo L (2019) Wdlreconnet: compressive sensing reconstruction with deep learning over wireless fading channels. IEEE Access

  3. Deng C, Liao S, Xie Y, Parhi KK, Qian X, Yuan B (2018) Permdnn: efficient compressed dnn architecture with permuteddiagonal matrices. In: 2018 51st annual IEEE/ACM international symposium on microarchitecture (MICRO). IEEE, pp 189–202

  4. Zhang S, Qi Y, Jiang F, Lan X, Yuen PC, Zhou H (2017) Point-to-set distance metric learning on deep representations for visual tracking. IEEE Trans Intell Transp Syst 19(1):187–198

    Article  Google Scholar 

  5. Cheng Y, Wang D, Zhou P, Zhang T (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Proc Mag 35(1):126– 136

    Article  Google Scholar 

  6. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  7. Qi Y, Zhang S, Qin L, Huang Q, Yao H, Lim J, Yang M-H (2018) Hedging deep features for visual tracking. IEEE Trans Pattern Analysis Machine Intell 41(5):1116–1130

    Article  Google Scholar 

  8. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, et al. (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29(6):82–97

    Article  Google Scholar 

  9. Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1):30–42

    Article  Google Scholar 

  10. Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, et al. (2014) Deep speech: scaling up end-to-end speech recognition. arXiv:1412.5567

  11. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  12. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  13. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467

  14. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274

  15. Seide F, Agarwal A (2016) Cntk: microsoft’s open-source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2135–2135

  16. Huffman DA, et al. (2006) A method for the construction of minimum-redundancy codes. J Resonance 11(2):91–99

    Article  Google Scholar 

  17. Lin Z, Courbariaux M, Memisevic R, Bengio Y (2015) Neural networks with few multiplications. arXiv:1510.03009

  18. Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160

  19. Vanhoucke V, Senior A, Mao MZ (2011) Improving the speed of neural networks on cpus

  20. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. CoRR, arXiv:1502.02551, vol 392

  21. Courbariaux M, Bengio Y, David J-P (2015) Binaryconnect: training deep neural networks with binary weights duringpropagations. In: Advances in neural information processing systems, pp 3123–3131

  22. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: 2016 ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 243–254

  23. Choi Y, El-Khamy M, Lee J (2018) Universal deep neural network compression. arXiv:1802.02271

  24. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143

  25. He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1389–1397

  26. Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277

  27. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, arXiv:1510.00149, vol 2

  28. Cheng Y, Yu FX, Feris RS, Kumar S, Choudhary A, Chang S-F (2015) An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE international conference on computer vision, pp 2857– 2865

  29. Shah P, Rao N, Tang G (2015) Sparse and low-rank tensor decomposition. In: Advances in neural information processing systems, pp 2548–2556

  30. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  31. Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: Advances in neural information processing systems, pp 2377–2385

  32. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: hints for thin deep nets. arXiv:1412.6550

  33. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385

  34. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Aistats, vol 9, pp 249–256

  35. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  36. Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066

  37. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 448–456

  38. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  39. Van Leeuwen J (1976) On the construction of huffman trees. In: ICALP, pp 382–410

  40. Graham B (2014) Spatially-sparse convolutional neural networks. arXiv:1409.6070

Download references

Acknowledgements

Dianhui Chu (cdh@hit.edu.cn). and Jinhui Zhu are Co-Correspondent Author (csjhzhu@scut.edu.cn). This work was supported in part by the National Key Research and Development Program of China (No. 2018YFB1402500), the National Natural Science Foundation of China (No.61902090, 61772159), and University Co-construction Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dianhui Chu.

Ethics declarations

Conflict of interests

The authors declare no conflict of interest. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work entitled “Bit-Quantized-Net: An Effective Deep Neural Networks Compression Method”.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Du, Q., Xu, X. et al. Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks. Mobile Netw Appl 26, 104–113 (2021). https://doi.org/10.1007/s11036-020-01687-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-020-01687-0

Navigation