aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks

Hu, Jiaxin; Rao, Weixiong; Zhao, Qinpei

doi:10.1007/978-3-030-75765-6_17

aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks

Jiaxin Hu¹⁵,
Weixiong Rao¹⁵ &
Qinpei Zhao¹⁵

Conference paper
First Online: 08 May 2021

2215 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12713))

Abstract

For deep neural networks (DNNs), a high model accuracy is usually the main focus. However, millions of model parameters commonly lead to high space overheads, especially parameter redundancy. By maintaining network weights with less bit-widths, network quantization has been used to compress DNNs for lower space costs. However, existing quantization methods cannot well optimally balance the model size and the accuracy, thus they suffer from the accuracy loss more or less. Besides, though few of existing quantization techniques can adaptively determine layers quantization bit-widths, they either give little consideration on the relations of different DNN layers, or are designed for special hardware environment that are not universal in broad computer fields. To overcome these issues, we propose an adaptive Hierarchical Clustering based Quantization (aHCQ) framework. The aHCQ can find a largely compressed model from the quantization of each layer and take only little loss on the model accuracy. It is shown from the experiments that the aHCQ can achieve \(11.4\times \) and \(8.2\times \) model compression rates with only around \(0.5\%\) drop of the model accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Cheng, Y., Wang, D., Zhou, P., et al.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Choi, Y., El-Khamy, M., Lee, J.: Towards the limit of network quantization. arXiv preprint arXiv:1612.01543 (2016)
Gupta, S., Agrawal, A., Gopalakrishnan, K., et al.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems (2015)
Google Scholar
Jung, S., Son, C., Lee, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)
Google Scholar
Liu, S., Lin, Y., Zhou, Z., et al.: On-demand deep model compression for mobile devices: a usage-driven model selection framework. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389–400 (2018)
Google Scholar
Wang, K., Liu, Z., Lin, Y., et al.: Haq: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)
Google Scholar
Xu, Y., Wang, Y., Zhou, A., et al.: Deep neural network compression with single and multiple level quantization. arXiv preprint arXiv:1803.03289 (2018)
Zhou, Y., Moosavi-Dezfooli, S.M., Cheung, N.M., et al.: Adaptive quantization for deep neural network. arXiv preprint arXiv:1712.01048 (2017)
Zhu, C., Han, S., Mao, H., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Courbariaux, M., Hubara, I., Soudry, D., et al.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or -1. arXiv preprint arXiv:1602.02830 (2016)
Wen, W., Wu, C., Wang, Y., et al.: Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665 (2016)
Wu, J., Leng, C., Wang, Y., et al.: Quantized convolutional neural networks for mobile devices. In: On Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)
Google Scholar
Darlow, L.N., Crowley, E.J., Antoniou, A., et al.: CINIC-10 is not ImageNet or CIFAR-10. arXiv preprint arXiv:1810.03505 (2018)
LeCun, Y.: LeNet-5, convolutional neural networks, vol. 20, no. 5, p. 14 (2015). http://yann.lecun.com/exdb/lenet
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Choudhary, T., Mishra, V., Goswami, A., Sarangapani, J.: A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53(7), 5113–5155 (2020). https://doi.org/10.1007/s10462-020-09816-7
Article Google Scholar

Download references

Acknowledgment

We would like to thank all reviewers for their comments. This work was partially supported by National Natural Science Foundation of China (Grant No. 61972286). And this work was supported by the Natural Science Foundation of Shanghai, China (No. 20ZR1460500).

Author information

Authors and Affiliations

School of Software Engineering, The Tongji University, Shanghai, China
Jiaxin Hu, Weixiong Rao & Qinpei Zhao

Authors

Jiaxin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Weixiong Rao
View author publications
You can also search for this author in PubMed Google Scholar
Qinpei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weixiong Rao or Qinpei Zhao .

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, J., Rao, W., Zhao, Q. (2021). aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-75765-6_17
Published: 08 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics