Skip to main content
Log in

Compression of Deep Neural Networks with Structured Sparse Ternary Coding

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) contain large number of weights, and usually require many off-chip memory accesses for inference. Weight size compression is a major requirement for on-chip memory based implementation of DNNs, which not only increases inference speed but also reduces power consumption. We propose a weight compression method for deep neural networks by combining pruning and quantization. The proposed method allows weights to have values of + 1 or − 1 only at predetermined positions. Then, a look-up table stores all possible combinations of sub-vectors of weight matrices. Encoding and decoding structured sparse weights can be conducted easily with the table. This method not only allows multiplication-free DNN implementations but also compresses the weight storage by as much as x32 times more than that in floating-point networks and with only a tiny performance loss. Weight distribution normalization and gradual pruning techniques are applied to lower performance degradation. Experiments are conducted with fully connected DNNs and convolutional neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).

  2. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., et al (2016). Deep speech 2: end-to-end speech recognition in english and mandarin. In ICML 2016: 33rd international conf machine learning.

  3. Hong, S., & Kim, H. (2010). An integrated gpu power and performance model. ACM SIGARCH Computer Architecture News. ACM, 38(3), 280–289.

    Article  Google Scholar 

  4. Hwang, K., & Sung, W. (2014). Fixed-point feedforward deep neural network design using weights+ 1, 0, and - 1. In 2014 IEEE Workshop on signal processing systems (SiPS) (pp. 1–6). IEEE.

  5. Li, F., Zhang, B., Liu, B. (2016). Ternary weight networks. arXiv:1605.04711.

  6. Zhu, C., Han, S., Mao, H., Dally, W.J. (2017). Trained ternary quantization. In International conference on learning representations (ICLR).

  7. Kim, J., Hwang, K., Sung, W. (2014). X1000 real-time phoneme recognition vlsi using feed-forward deep neural networks. In 2014 IEEE International conference on in acoustics, speech and signal processing (ICASSP) (pp. 7510–7514). IEEE.

  8. Park, J., & Sung, W. (2016). Fpga based implementation of deep neural networks using on-chip memory only. In 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 1011–1015). IEEE.

  9. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision (pp. 525–542). Springer.

  10. Courbariaux, M., Bengio, Y., David, J.-P. (2015). Binaryconnect: training deep neural networks with binary weights during propagations. In Advances in neural information processing systems (pp. 3123–3131).

  11. Han, S., Mao, H., Dally, W.J. (2016). Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In International conference on learning representations (ICLR).

  12. Han, S., Pool, J., Tran, J., Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).

  13. See, A., Luong, M.-T., Manning, C.D. (2016). Compression of neural machine translation models via pruning. In Proceedings of The 20th SIGNLL conference on computational natural language learning (pp. 291–301).

  14. Yu, D., Seide, F., Li, G., Deng, L. (2012). Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In 2012 IEEE International conference on acoustics, speech and signal processing (ICASSP) (pp. 4409–4412). IEEE.

  15. Narang, S., Diamos, G., Sengupta, S., Elsen, E. (2017). Exploring sparsity in recurrent neural networks. In International conference on learning representations (ICLR).

  16. Anwar, S., Hwang, K., Sung, W. (2017). Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computin Systems (JETC), 13, 3.

    Google Scholar 

  17. Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H. (2016). Learning structured sparsity in deep neural networks. In Advances in neural information processing systems (pp. 2074–2082).

  18. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J. (2017). Pruning convolutional neural networks for resource efficient transfer learning. In International conference on learning representations (ICLR).

  19. Sung, W., Shin, S., Hwang, K. (2015). Resiliency of deep neural networks under quantization. arXiv:1511.06488.

  20. Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  21. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).

  22. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J. (2016). Eie: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd international symposium on computer architecture (pp. 243–254). IEEE Press.

  23. Boo, Y., & Sung, W. (2017). Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. In 2017 IEEE International workshop on signal processing systems (SiPS). IEEE.

  24. Denil, M., Shakibi, B., Dinh, L., De Freitas, N., et al (2013). Predicting parameters in deep learning. In Advances in neural information processing systems (pp. 2148–2156).

  25. Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (ICML) (pp. 448–456).

  26. Salimans, T., & Kingma, D.P. (2016). Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems (pp. 901–901).

  27. Shin, S., Boo, Y., Sung, W. (2017). Fixed-point optimization of deep neural networks with adaptive step size retraining. In 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP).

  28. Krizhevsky, A, & Hinton, G. (2009). Learning multiple layers of features from tiny images.

  29. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  30. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B. (1993). Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  31. Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3), 400–401.

    Article  Google Scholar 

  32. Projects:infimnist, http://leon.bottou.org/projects/infimnist, Accessed 01 May 2017.

  33. Kingma, D.P., & Ba, J. (2015). Adam: a method for stochastic optimization. In International conference on learning representations (ICLR).

  34. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  35. Vedaldi, A., & Lenc, K. (2015). Matconvnet: convolutional neural networks for matlab. In Proceedings of the 23rd ACM international conference on multimedia (pp. 689–692). ACM.

Download references

Acknowledgements

This work is supported in part by the Brain Korea 21 Plus Project and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2015R1A2A1A10056051). This work is also supported by Samsung Advanced Institute of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoonho Boo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of “Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations,” presented in IEEE International Workshop on Signal Processing Systems(SiPS) 2017 [23].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boo, Y., Sung, W. Compression of Deep Neural Networks with Structured Sparse Ternary Coding. J Sign Process Syst 91, 1009–1019 (2019). https://doi.org/10.1007/s11265-018-1418-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1418-z

Keywords

Navigation