Skip to main content

aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks

  • Conference paper
  • First Online:
  • 2215 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12713))

Abstract

For deep neural networks (DNNs), a high model accuracy is usually the main focus. However, millions of model parameters commonly lead to high space overheads, especially parameter redundancy. By maintaining network weights with less bit-widths, network quantization has been used to compress DNNs for lower space costs. However, existing quantization methods cannot well optimally balance the model size and the accuracy, thus they suffer from the accuracy loss more or less. Besides, though few of existing quantization techniques can adaptively determine layers quantization bit-widths, they either give little consideration on the relations of different DNN layers, or are designed for special hardware environment that are not universal in broad computer fields. To overcome these issues, we propose an adaptive Hierarchical Clustering based Quantization (aHCQ) framework. The aHCQ can find a largely compressed model from the quantization of each layer and take only little loss on the model accuracy. It is shown from the experiments that the aHCQ can achieve \(11.4\times \) and \(8.2\times \) model compression rates with only around \(0.5\%\) drop of the model accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cheng, Y., Wang, D., Zhou, P., et al.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)

  2. Choi, Y., El-Khamy, M., Lee, J.: Towards the limit of network quantization. arXiv preprint arXiv:1612.01543 (2016)

  3. Gupta, S., Agrawal, A., Gopalakrishnan, K., et al.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)

    Google Scholar 

  4. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  5. Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  6. Jung, S., Son, C., Lee, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)

    Google Scholar 

  7. Liu, S., Lin, Y., Zhou, Z., et al.: On-demand deep model compression for mobile devices: a usage-driven model selection framework. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389–400 (2018)

    Google Scholar 

  8. Wang, K., Liu, Z., Lin, Y., et al.: Haq: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)

    Google Scholar 

  9. Xu, Y., Wang, Y., Zhou, A., et al.: Deep neural network compression with single and multiple level quantization. arXiv preprint arXiv:1803.03289 (2018)

  10. Zhou, Y., Moosavi-Dezfooli, S.M., Cheung, N.M., et al.: Adaptive quantization for deep neural network. arXiv preprint arXiv:1712.01048 (2017)

  11. Zhu, C., Han, S., Mao, H., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)

  12. Courbariaux, M., Hubara, I., Soudry, D., et al.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or -1. arXiv preprint arXiv:1602.02830 (2016)

  13. Wen, W., Wu, C., Wang, Y., et al.: Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665 (2016)

  14. Wu, J., Leng, C., Wang, Y., et al.: Quantized convolutional neural networks for mobile devices. In: On Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)

    Google Scholar 

  15. Darlow, L.N., Crowley, E.J., Antoniou, A., et al.: CINIC-10 is not ImageNet or CIFAR-10. arXiv preprint arXiv:1810.03505 (2018)

  16. LeCun, Y.: LeNet-5, convolutional neural networks, vol. 20, no. 5, p. 14 (2015). http://yann.lecun.com/exdb/lenet

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  18. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  19. Choudhary, T., Mishra, V., Goswami, A., Sarangapani, J.: A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53(7), 5113–5155 (2020). https://doi.org/10.1007/s10462-020-09816-7

    Article  Google Scholar 

Download references

Acknowledgment

We would like to thank all reviewers for their comments. This work was partially supported by National Natural Science Foundation of China (Grant No. 61972286). And this work was supported by the Natural Science Foundation of Shanghai, China (No. 20ZR1460500).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weixiong Rao or Qinpei Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, J., Rao, W., Zhao, Q. (2021). aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75765-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75764-9

  • Online ISBN: 978-3-030-75765-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics