research-article

Channel-wise quantization without accuracy degradation using Δloss analysis

Authors:
Kazuki Okado

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan
View Profile

,
Kengo Matsumoto

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan
View Profile

,
Atsuki Inoue

Kobe University, Japan

Kobe University, Japan
View Profile

,
Hiroshi Kawaguchi

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan

Graduate School of Science,Technology and Innovation,Kobe University, Kobe University, Japan
View Profile

,
Yasufumi Sakai

Fujitsu Limited, Japan

Fujitsu Limited, Japan
View Profile

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning TechnologiesMarch 2022Pages 56–61https://doi.org/10.1145/3529399.3529409

Published:10 June 2022Publication History

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

Pages 56–61

ABSTRACT

Recent studies have pointed out that the effect of quantization of convolutional neural networks on accuracy varies from layer to layer. For this reason, partial quantization or mixed-precision quantization on a layer basis have been considered for quantization. However, the layer quantization has a large impact on accuracy because its granularity is large; so, it generally requires retraining of the network, which incurs high computational cost. In this study, we proposed a new search algorithm for partial quantization, which aims to derive practical combinations of quantized channels without retraining. The proposed method successfully quantizes 83.3% of the parameters without degrading the accuracy in ResNet18 4-bit quantization. In addition, the proposed method succeeded in compressing 80.8% of the parameters in ResNet34 without degrading the accuracy.

References

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440Google Scholar
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. (2015) arXiv preprint arXiv:1503.02531Google Scholar
J. Wu, C. Leng, Y. Wang, H. Qinghao, and J. Cheng (2016) Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4820-4828)Google ScholarCross Ref
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou (2016) Dorefa-net: Training lowbit width convolutional neural networks with lowbit width gradients. arXiv preprint arXiv:1606.06160Google Scholar
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G Howard, Hartwig Adam, and Dmitry Kalenichenko (2018) Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2018.00286Google Scholar
Barret Zoph and Quoc V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578Google Scholar
B. Wu, Y. Wang, P. Zhang, Y. Tian, P. Vajda, and K. Keutzer (2018) Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090Google Scholar
Song Han, Huizi Mao, and William J. Dally (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149Google Scholar
Y. Zhou, SM. Moosavi-Dezfooli, NM. Cheung, and P. Frossard (2018) Adaptive quantization for deep neural network. In ThirtySecond AAAI Conference on Artificial IntelligenceGoogle ScholarCross Ref
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han (2019) HAQ: Hardware-aware automated quantization. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2019.00881Google Scholar
Z. Dong, Z. Yao, A. Gholami, M. Mahoney, and K. Keutzer (2019) HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision. IEEE/CVF International Conference on Computer Vision https://doi.org/10.1109/ICCV.2019.00038Google Scholar
Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort (2020) Up or Down? Adaptive Rounding for Post-Training Quantization. arXiv preprint arXiv:2004.10568Google Scholar
Levent Sagun, Leon Bottou, and Yann LeCun. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv preprint arXiv:1611.07476, 2016.Google Scholar
Levent Sagun, Utku Evci, V Ugur Guney, Yann Dauphin, and Leon Bottou. Empirical analysis of the hessian of over-parametrized neural networks. arXiv preprint arXiv:1706.04454, 2017.Google Scholar
Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via hessian eigenvalue density. arXiv preprint arXiv:1901.10159, 2019.Google Scholar
Wu H., Judd P., Zhang X., Isaev M., and Micikevicius P., “Integer quantization for deep learning inference: Principles and empirical evaluation,” arXiv:2004.09602, 2020Google Scholar

Index Terms

Channel-wise quantization without accuracy degradation using Δloss analysis
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Quantization in H.264 is achieved in the DCT domain using scalar quantizers, which assume a sum distortion constraint and often produce considerably larger distortions on block boundaries than inside a block in the pixel domain. This biased distortion ...
Read More
Single and double frame coding of speech LPC parameters using a lattice-based quantization scheme

A lattice-based scheme for the single-frame and the double-frame quantization of the speech line spectral frequency parameters is proposed. The lattice structure provides a low-complexity vector quantization framework, which is implemented using a ...
Read More
On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Wyner-Ziv (WZ) coding has recently been proposed as a low encoding complexity alternative to traditional DPCM coding for compression of sources with memory, in particular, in applications like multimedia compression. The viability of this alternative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies
March 2022
291 pages
ISBN:9781450395748
DOI:10.1145/3529399

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Accuracy improvement
Convolutional neural network
Deep learning
Quantization
Sensitive analysis
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Channel-wise quantization without accuracy degradation using Δloss analysis

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

Single and double frame coding of speech LPC parameters using a lattice-based quantization scheme

On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Channel-wise quantization without accuracy degradation using Δloss analysis

ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

Single and double frame coding of speech LPC parameters using a lattice-based quantization scheme

On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media