skip to main content
10.1145/3529399.3529409acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmltConference Proceedingsconference-collections
research-article

Channel-wise quantization without accuracy degradation using Δloss analysis

Authors Info & Claims
Published:10 June 2022Publication History

ABSTRACT

Recent studies have pointed out that the effect of quantization of convolutional neural networks on accuracy varies from layer to layer. For this reason, partial quantization or mixed-precision quantization on a layer basis have been considered for quantization. However, the layer quantization has a large impact on accuracy because its granularity is large; so, it generally requires retraining of the network, which incurs high computational cost. In this study, we proposed a new search algorithm for partial quantization, which aims to derive practical combinations of quantized channels without retraining. The proposed method successfully quantizes 83.3% of the parameters without degrading the accuracy in ResNet18 4-bit quantization. In addition, the proposed method succeeded in compressing 80.8% of the parameters in ResNet34 without degrading the accuracy.

References

  1. Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440Google ScholarGoogle Scholar
  2. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. (2015) arXiv preprint arXiv:1503.02531Google ScholarGoogle Scholar
  3. J. Wu, C. Leng, Y. Wang, H. Qinghao, and J. Cheng (2016) Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4820-4828)Google ScholarGoogle ScholarCross RefCross Ref
  4. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou (2016) Dorefa-net: Training lowbit width convolutional neural networks with lowbit width gradients. arXiv preprint arXiv:1606.06160Google ScholarGoogle Scholar
  5. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G Howard, Hartwig Adam, and Dmitry Kalenichenko (2018) Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2018.00286Google ScholarGoogle Scholar
  6. Barret Zoph and Quoc V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578Google ScholarGoogle Scholar
  7. B. Wu, Y. Wang, P. Zhang, Y. Tian, P. Vajda, and K. Keutzer (2018) Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090Google ScholarGoogle Scholar
  8. Song Han, Huizi Mao, and William J. Dally (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149Google ScholarGoogle Scholar
  9. Y. Zhou, SM. Moosavi-Dezfooli, NM. Cheung, and P. Frossard (2018) Adaptive quantization for deep neural network. In ThirtySecond AAAI Conference on Artificial IntelligenceGoogle ScholarGoogle ScholarCross RefCross Ref
  10. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han (2019) HAQ: Hardware-aware automated quantization. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2019.00881Google ScholarGoogle Scholar
  11. Z. Dong, Z. Yao, A. Gholami, M. Mahoney, and K. Keutzer (2019) HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision. IEEE/CVF International Conference on Computer Vision https://doi.org/10.1109/ICCV.2019.00038Google ScholarGoogle Scholar
  12. Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort (2020) Up or Down? Adaptive Rounding for Post-Training Quantization. arXiv preprint arXiv:2004.10568Google ScholarGoogle Scholar
  13. Levent Sagun, Leon Bottou, and Yann LeCun. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.Google ScholarGoogle Scholar
  14. Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017.Google ScholarGoogle Scholar
  15. Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via hessian eigenvalue density. arXiv preprint arXiv:1901.10159, 2019.Google ScholarGoogle Scholar
  16. Wu H., Judd P., Zhang X., Isaev M., and Micikevicius P., “Integer quantization for deep learning inference: Principles and empirical evaluation,” arXiv:2004.09602, 2020Google ScholarGoogle Scholar

Index Terms

  1. Channel-wise quantization without accuracy degradation using Δloss analysis
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies
          March 2022
          291 pages
          ISBN:9781450395748
          DOI:10.1145/3529399

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 June 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)40
          • Downloads (Last 6 weeks)3

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format