Abstract
For deep neural networks (DNNs), a high model accuracy is usually the main focus. However, millions of model parameters commonly lead to high space overheads, especially parameter redundancy. By maintaining network weights with less bit-widths, network quantization has been used to compress DNNs for lower space costs. However, existing quantization methods cannot well optimally balance the model size and the accuracy, thus they suffer from the accuracy loss more or less. Besides, though few of existing quantization techniques can adaptively determine layers quantization bit-widths, they either give little consideration on the relations of different DNN layers, or are designed for special hardware environment that are not universal in broad computer fields. To overcome these issues, we propose an adaptive Hierarchical Clustering based Quantization (aHCQ) framework. The aHCQ can find a largely compressed model from the quantization of each layer and take only little loss on the model accuracy. It is shown from the experiments that the aHCQ can achieve \(11.4\times \) and \(8.2\times \) model compression rates with only around \(0.5\%\) drop of the model accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cheng, Y., Wang, D., Zhou, P., et al.: A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017)
Choi, Y., El-Khamy, M., Lee, J.: Towards the limit of network quantization. arXiv preprint arXiv:1612.01543 (2016)
Gupta, S., Agrawal, A., Gopalakrishnan, K., et al.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems (2015)
Jung, S., Son, C., Lee, S., et al.: Learning to quantize deep networks by optimizing quantization intervals with task loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4350–4359 (2019)
Liu, S., Lin, Y., Zhou, Z., et al.: On-demand deep model compression for mobile devices: a usage-driven model selection framework. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389–400 (2018)
Wang, K., Liu, Z., Lin, Y., et al.: Haq: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)
Xu, Y., Wang, Y., Zhou, A., et al.: Deep neural network compression with single and multiple level quantization. arXiv preprint arXiv:1803.03289 (2018)
Zhou, Y., Moosavi-Dezfooli, S.M., Cheung, N.M., et al.: Adaptive quantization for deep neural network. arXiv preprint arXiv:1712.01048 (2017)
Zhu, C., Han, S., Mao, H., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Courbariaux, M., Hubara, I., Soudry, D., et al.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or -1. arXiv preprint arXiv:1602.02830 (2016)
Wen, W., Wu, C., Wang, Y., et al.: Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665 (2016)
Wu, J., Leng, C., Wang, Y., et al.: Quantized convolutional neural networks for mobile devices. In: On Computer Vision and Pattern Recognition, pp. 4820–4828 (2016)
Darlow, L.N., Crowley, E.J., Antoniou, A., et al.: CINIC-10 is not ImageNet or CIFAR-10. arXiv preprint arXiv:1810.03505 (2018)
LeCun, Y.: LeNet-5, convolutional neural networks, vol. 20, no. 5, p. 14 (2015). http://yann.lecun.com/exdb/lenet
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Choudhary, T., Mishra, V., Goswami, A., Sarangapani, J.: A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53(7), 5113–5155 (2020). https://doi.org/10.1007/s10462-020-09816-7
Acknowledgment
We would like to thank all reviewers for their comments. This work was partially supported by National Natural Science Foundation of China (Grant No. 61972286). And this work was supported by the Natural Science Foundation of Shanghai, China (No. 20ZR1460500).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, J., Rao, W., Zhao, Q. (2021). aHCQ: Adaptive Hierarchical Clustering Based Quantization Framework for Deep Neural Networks. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)