ABSTRACT
Recent studies have pointed out that the effect of quantization of convolutional neural networks on accuracy varies from layer to layer. For this reason, partial quantization or mixed-precision quantization on a layer basis have been considered for quantization. However, the layer quantization has a large impact on accuracy because its granularity is large; so, it generally requires retraining of the network, which incurs high computational cost. In this study, we proposed a new search algorithm for partial quantization, which aims to derive practical combinations of quantized channels without retraining. The proposed method successfully quantizes 83.3% of the parameters without degrading the accuracy in ResNet18 4-bit quantization. In addition, the proposed method succeeded in compressing 80.8% of the parameters in ResNet34 without degrading the accuracy.
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440Google Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. (2015) arXiv preprint arXiv:1503.02531Google Scholar
- J. Wu, C. Leng, Y. Wang, H. Qinghao, and J. Cheng (2016) Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4820-4828)Google ScholarCross Ref
- Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou (2016) Dorefa-net: Training lowbit width convolutional neural networks with lowbit width gradients. arXiv preprint arXiv:1606.06160Google Scholar
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G Howard, Hartwig Adam, and Dmitry Kalenichenko (2018) Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2018.00286Google Scholar
- Barret Zoph and Quoc V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578Google Scholar
- B. Wu, Y. Wang, P. Zhang, Y. Tian, P. Vajda, and K. Keutzer (2018) Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090Google Scholar
- Song Han, Huizi Mao, and William J. Dally (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149Google Scholar
- Y. Zhou, SM. Moosavi-Dezfooli, NM. Cheung, and P. Frossard (2018) Adaptive quantization for deep neural network. In ThirtySecond AAAI Conference on Artificial IntelligenceGoogle ScholarCross Ref
- Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han (2019) HAQ: Hardware-aware automated quantization. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2019.00881Google Scholar
- Z. Dong, Z. Yao, A. Gholami, M. Mahoney, and K. Keutzer (2019) HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision. IEEE/CVF International Conference on Computer Vision https://doi.org/10.1109/ICCV.2019.00038Google Scholar
- Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort (2020) Up or Down? Adaptive Rounding for Post-Training Quantization. arXiv preprint arXiv:2004.10568Google Scholar
- Levent Sagun, Leon Bottou, and Yann LeCun. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.Google Scholar
- Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017.Google Scholar
- Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via hessian eigenvalue density. arXiv preprint arXiv:1901.10159, 2019.Google Scholar
- Wu H., Judd P., Zhang X., Isaev M., and Micikevicius P., “Integer quantization for deep learning inference: Principles and empirical evaluation,” arXiv:2004.09602, 2020Google Scholar
Index Terms
- Channel-wise quantization without accuracy degradation using Δloss analysis
Recommendations
Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processingQuantization in H.264 is achieved in the DCT domain using scalar quantizers, which assume a sum distortion constraint and often produce considerably larger distortions on block boundaries than inside a block in the pixel domain. This biased distortion ...
Single and double frame coding of speech LPC parameters using a lattice-based quantization scheme
A lattice-based scheme for the single-frame and the double-frame quantization of the speech line spectral frequency parameters is proposed. The lattice structure provides a low-complexity vector quantization framework, which is implemented using a ...
On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources
Wyner-Ziv (WZ) coding has recently been proposed as a low encoding complexity alternative to traditional DPCM coding for compression of sources with memory, in particular, in applications like multimedia compression. The viability of this alternative ...
Comments