Power Awareness in Low Precision Neural Networks

Eliezer, Nurit Spingarn; Banner, Ron; Ben-Yaakov, Hilla; Hoffer, Elad; Michaeli, Tomer

doi:10.1007/978-3-031-25082-8_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13807))

Included in the following conference series:

European Conference on Computer Vision

1552 Accesses

Abstract

Existing approaches for reducing DNN power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not consider the precise power consumed by each module in the network and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network and can also be used during training to achieve improved performance. Unlike previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network and enables to seamlessly traverse the power-accuracy trade-off at deployment time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.graphcore.ai/.
2.
The power consumed by a single bit flip may vary across platforms (e.g., between a 5 nm and a 45 nm fabrication), but the number of bit flips per MAC does not change. We therefore report power in units of bit-flips, which allows comparing between implementations while ignoring the platform.
3.
Batch-norm layers should first be absorbed into the weights and biases.
4.
In quantized models MAC operations are always performed on integers and rescaling is applied at the end.

References

Abts, D., et al.: Think fast: a tensor streaming processor (tsp) for accelerating deep learning workloads. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 145–158. IEEE (2020)
Google Scholar
Achterhold, J., Koehler, J.M., Schmeink, A., Genewein, T.: Variational network quantization. In: International Conference on Learning Representations (2018)
Google Scholar
Asif, S., Kong, Y.: Performance analysis of wallace and radix-4 booth-wallace multipliers. In: 2015 Electronic System Level Synthesis Conference (ESLsyn), pp. 17–22. IEEE (2015)
Google Scholar
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Advances in Neural Information Processing Systems, pp. 7950–7958 (2019)
Google Scholar
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13169–13178 (2020)
Google Scholar
Chen, H., et al.: Addernet: do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1468–1477 (2020)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363 (2015)
Elhoushi, M., Chen, Z., Shafiq, F., Tian, Y.H., Li, J.Y.: Deepshift: towards multiplication-less neural networks. arXiv preprint arXiv:1905.13298 (2019)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2019)
Google Scholar
Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Post-training piecewise linear quantization for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 69–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_5
Chapter Google Scholar
Gudaparthi, S., Narayanan, S., Balasubramonian, R., Giacomin, E., Kambalasubramanyam, H., Gaillardon, P.E.: Wire-aware architecture and dataflow for CNN accelerators. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13 (2019)
Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)
Google Scholar
Haroush, M., Hubara, I., Hoffer, E., Soudry, D.: The knowledge within: Methods for data-free model compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8494–8502 (2020)
Google Scholar
Horowitz, M.: Computing’s energy problem (and what we can do about it). In:2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14 (2014)
Google Scholar
Horowitz, M.: Energy table for 45nm process. In: Stanford VLSI wiki (2014)
Google Scholar
Huang, N.C., Chou, H.J., Wu, K.C.: Efficient systolic array based on decomposable mac for quantized deep neural networks (2019)
Google Scholar
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: Layer-wise calibration and integer programming. arXiv preprint arXiv:2006.10518 (2020)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825tops. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 136–140. IEEE (2020)
Google Scholar
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Google Scholar
Kalamkar, D.,et al.: A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)
Karimi, N., Moos, T., Moradi, A.: Exploring the effect of device aging on static power analysis attacks. UMBC Faculty Collection (2019)
Google Scholar
Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06530
Kim, Y., Kim, H., Yadav, N., Li, S., Choi, K.K.: Low-power RTL code generation for advanced CNN algorithms toward object detection in autonomous vehicles. Electronics 9(3), 478 (2020)
Article Google Scholar
Kwon, H., Chatarasi, P., Pellauer, M., Parashar, A., Sarkar, V., Krishna, T.: Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 754–768 (2019)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Google Scholar
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015)
Liu, X., Ye, M., Zhou, D., Liu, Q.: Post-training quantization with multiple points: Mixed precision without mixed precision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8697–8705 (2021)
Google Scholar
Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. arXiv preprint arXiv:1810.01875 (2018)
Mahmoud, M.: Tensordash: Exploiting sparsity to accelerate deep neural network training and inference (2020)
Google Scholar
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)
Mukherjee, A., Saurav, K., Nair, P., Shekhar, S., Lis, M.: A case for emerging memories in dnn accelerators. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 938–941. IEEE (2021)
Google Scholar
Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. arXiv preprint arXiv:2004.10568 (2020)
Nagel, M., Baalen, M.v., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1325–1334 (2019)
Google Scholar
Nahshan, Y., et al.: Loss aware post-training quantization. arXiv preprint arXiv:1911.07190 (2019)
Nasser, Y., Prévotet, J.C., Hélard, M., Lorandel, J.: Dynamic power estimation based on switching activity propagation. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–2. IEEE (2017)
Google Scholar
Ni, R., Chu, H.m., Castaneda Fernandez, O., Chiang, P.V., Studer, C., Goldstein, T.: Wrapnet: Neural net inference with ultra-low-precision arithmetic. In: 9th International Conference on Learning Representations (ICLR 2021) (2021)
Google Scholar
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)
Google Scholar
Rodriguez, A., et al.: Lower numerical precision deep learning inference and training. Intel White Paper 3, 1–19 (2018)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Xu, S., et al.: Generative low-bitwidth data free quantization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_1
Chapter Google Scholar
Tam, E., et al.: Breaking the memory wall for AI chip with a new dimension. In: 2020 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), pp. 1–7. IEEE (2020)
Google Scholar
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
Google Scholar
Tschannen, M., Khanna, A., Anandkumar, A.: StrassenNets: deep learning with a multiplication budget. In: International Conference on Machine Learning. pp. 4985–4994. PMLR (2018)
Google Scholar
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
Google Scholar
You, H., et al.: ShiftaddNet: a hardware-inspired deep network. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Google Scholar

Download references

Acknowledgements

This research was partially supported by the Ollendorff Miverva Center at the Viterbi Faculty of Electrical and Computer Engineering, Technion.

Author information

Authors and Affiliations

Technion–Israel Institute of Technology, Haifa, Israel
Nurit Spingarn Eliezer & Tomer Michaeli
Habana labs, Intel, Haifa, Israel
Nurit Spingarn Eliezer, Ron Banner, Hilla Ben-Yaakov & Elad Hoffer

Authors

Nurit Spingarn Eliezer
View author publications
You can also search for this author in PubMed Google Scholar
Ron Banner
View author publications
You can also search for this author in PubMed Google Scholar
Hilla Ben-Yaakov
View author publications
You can also search for this author in PubMed Google Scholar
Elad Hoffer
View author publications
You can also search for this author in PubMed Google Scholar
Tomer Michaeli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nurit Spingarn Eliezer .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1160 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eliezer, N.S., Banner, R., Ben-Yaakov, H., Hoffer, E., Michaeli, T. (2023). Power Awareness in Low Precision Neural Networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-25082-8_5
Published: 12 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25081-1
Online ISBN: 978-3-031-25082-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Power Awareness in Low Precision Neural Networks