On Practical Approach to Uniform Quantization of Non-redundant Neural Networks

Goncharenko, Alexander; Denisov, Andrey; Alyamkin, Sergey; Terentev, Evgeny

doi:10.1007/978-3-030-30484-3_29

Alexander Goncharenko^12,13,
Andrey Denisov^12,13,
Sergey Alyamkin¹² &
…
Evgeny Terentev¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Included in the following conference series:

International Conference on Artificial Neural Networks

3887 Accesses

Abstract

The neural network quantization is highly desired procedure to perform before running the neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of the quantization procedure that will maintain the accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS.

Here we present two methods to significantly optimize the training with the quantization procedure. The first one is introducing the trained scale factors for discretization thresholds that are separate for each filter. The second one is based on mutual rescaling of consequent depth-wise separable convolution and convolution layers. Using the proposed techniques, we quantize the modern mobile architectures of neural networks with the set of train data of only \(\sim \)10% of the total ImageNet 2012 sample. Such reduction of the train dataset size and a small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of the quantized model (the accuracy drop was less than 0.5%). The ready-for-use models and code are available at: https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.tensorflow.org/lite.
2.
https://developer.nvidia.com/tensorrt - NVIDIA TensorRT\(^\text {TM}\) platform, 2018.
3.
https://github.com/NervanaSystems/distiller.
4.
https://github.com/tensorflow/tensorflow/blob/master/tensor-flow/lite/g3doc/models.md - the image classification (Quantized Models).
5.
The network accuracy is measured on a full validation set ImageNet2012 which includes single-channel images.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint: arXiv:1603.04467 (2016)
Baskin, C., et al.: Nice: noise injection and clamping estimation for neural network quantization. arXiv preprint: arXiv:1810.00162 (2018)
Bengio, Y., Leonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint: arXiv:1308.3432 (2013)
Courbariaux, M., Bengio, Y., David, J.: Training deep neural networks with low precision multiplications. In: International Conference on Learning Representations (ICLR 2015) (2015)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint: arXiv:1503.02531 (2015)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint: arXiv:1704.04861 (2017)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS 2016), pp. 4107–4115 (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML 2015) (2015)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic only inference. In: Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR 2015) (2015)
Google Scholar
Lee, J.H., Ha, S., Choi, S., Lee, W., Lee, S.: Quantization for rapid deployment of deep neural networks. arXiv preprint: arXiv:1810.05488 (2018)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (ICLR 2017) (2017)
Google Scholar
McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. In: International Conference on Learning Representations (ICLR 2018) (2018)
Google Scholar
Mishra, A., Marr, D.: Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint: arXiv:1711.05852 (2017)
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint: arXiv:1709.01134 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv preprint: arXiv:1409.0575 (2014)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018) (2018)
Google Scholar
Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for MobileNets. arXiv preprint: arXiv:1803.08607 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR 2015) (2015)
Google Scholar
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: platform-aware neural architecture search for mobile. arXiv preprint: arXiv:1807.11626 (2018)
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint: arXiv:1606.06160 (2016)
Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: more bits per network or more networks per bit? arXiv preprint: arXiv:1806.07550 (2018)

Download references

Author information

Authors and Affiliations

Expasoft LLC, Novosibirsk, Russia
Alexander Goncharenko, Andrey Denisov & Sergey Alyamkin
Novosibirsk State University, Novosibirsk, Russia
Alexander Goncharenko & Andrey Denisov
Microtech, Moscow, Russia
Evgeny Terentev

Authors

Alexander Goncharenko
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Denisov
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Alyamkin
View author publications
You can also search for this author in PubMed Google Scholar
Evgeny Terentev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Goncharenko .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goncharenko, A., Denisov, A., Alyamkin, S., Terentev, E. (2019). On Practical Approach to Uniform Quantization of Non-redundant Neural Networks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-30484-3_29
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics