research-article

Dynamic Quantization Range Control for Analog-in-Memory Neural Networks Acceleration

Authors:

Nathan Laubeuf,

Jonas Doevenspeck,

Ioannis A. Papistas,

Michele Caselli,

Stefan Cosemans,

Debjyoti Bhattacharjee,

Arindam Mallik,

Peter Debacker,

Diederik Verkest,

Francky Catthoor,

Rudy LauwereinsAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 27, Issue 5

Article No.: 46, Pages 1 - 21

https://doi.org/10.1145/3498328

Published: 06 June 2022 Publication History

Abstract

Analog in Memory Computing (AiMC) based neural network acceleration is a promising solution to increase the energy efficiency of deep neural networks deployment. However, the quantization requirements of these analog systems are not compatible with state-of-the-art neural network quantization techniques. Indeed, while the quantization of the weights and activations is considered by modern deep neural network quantization techniques, AiMC accelerators also impose the quantization of each Matrix Vector Multiplication (MVM) result. In most demonstrated AiMC implementations, the quantization range of MVM results is considered a fixed parameter of the accelerator. This work demonstrates that dynamic control over this quantization range is possible but also desirable for analog neural networks acceleration. An AiMC compatible quantization flow coupled with a hardware aware quantization range driving technique is introduced to fully exploit these dynamic ranges. Using CIFAR-10 and ImageNet as benchmarks, the proposed solution results in networks that are both more accurate and more robust to the inherent vulnerability of analog circuits than fixed quantization range based approaches.

Appendix

A Hyperparameters Tuning

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In 12th \(\lbrace\)USENIX\(\rbrace\) Symposium on Operating Systems Design and Implementation (\(\lbrace\)OSDI\(\rbrace\) 16). 265–283.

[2]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2623–2631.

Digital Library

[3]

Dario Amodei, Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman, and Ilya Sutskever. [n.d.]. AI and Compute. https://openai.com/blog/ai-and-compute/. Accessed: 2021-05-04.

[4]

Chaim Baskin, Natan Liss, Yoav Chai, Evgenii Zheltonozhskii, Eli Schwartz, Raja Giryes, Avi Mendelson, and Alexander M. Bronstein. 2018. NICE: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018).

[5]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).

[6]

Debjyoti Bhattacharjee, Nathan Laubeuf, Stefan Cosemans, Ioannis Papistas, Arindam Mallik, Peter Debacker, Myung Hee Na, and Diederik Verkest. 2021. Design-technology space exploration for energy efficient AiMC-based inference acceleration. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5.

[7]

Avishek Biswas and Anantha P. Chandrakasan. 2018. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks. IEEE Journal of Solid-State Circuits 54, 1 (2018), 217–230.

[8]

Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).

[9]

Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8–20.

[10]

Jonas Doevenspeck, Peter Vrancx, Nathan Laubeuf, Arindam Mallik, Peter Debacker, Diederik Verkest, Rudy Lauwereins, and Wim Dehaene. 2021. Noise tolerant ternary weight deep neural networks for analog in-memory inference. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.

[11]

Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. 2020. Learned step size quantization. In ICLR. OpenReview.net.

[12]

Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, and Deming Chen. 2020. VecQ: Minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. (2020).

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[14]

Geoffrey Hinton. 2012. Neural networks for machine learning.Coursera (video lectures).

[15]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.

Digital Library

[16]

Sambhav Jain, Albert Gural, Michael Wu, and Chris Dick. 2020. Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 112–128.

[17]

Zhewei Jiang, Shihui Yin, Jae-Sun Seo, and Mingoo Seok. 2020. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism. IEEE Journal of Solid-State Circuits 55, 7 (2020), 1888–1897.

[18]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.

Digital Library

[19]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).

[20]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto.

[21]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).

[22]

Jeffrey L. McKinstry, Steven K. Esser, Rathinakumar Appuswamy, Deepika Bablani, John V. Arthur, Izzet B. Yildiz, and Dharmendra S. Modha. 2018. Discovering low-precision networks close to full-precision networks for efficient embedded inference. arXiv preprint arXiv:1809.04191 (2018).

[23]

Boris Murmann. 2020. Mixed-signal computing for deep neural network inference. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 1 (2020), 3–13.

[24]

Ioannis Papistas, Stefan Cosemans, Bram Rooseleer, Jonas Doevenspeck, Myung-Hee Na, Arindam Mallik, Peter Debacker, and Diederik Verkest. 2021. A \(22nm\), \(1540 \: TOP/s/W\), \(12:1 \: TOP/s/mm^2\) in-memory analog matrix-vector-multiplier for DNN acceleration. In 2021 IEEE Custom Integrated Circuits Conference (CICC). IEEE.

[25]

Alessandro Pappalardo. [n.d.]. Xilinx/brevitas.

[26]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019), 8026–8037.

[27]

Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10428–10436.

[28]

Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.

Digital Library

[29]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252.

[30]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.

[31]

Abu Sebastian, Irem Boybat, Martino Dazzi, Iason Giannopoulos, V. Jonnalagadda, Vinay Joshi, Geethan Karunaratne, Benedikt Kersting, Riduan Khaddam-Aljameh, S. R. Nandakumar, et al. 2019. Computational memory-based inference and training of deep neural networks. In 2019 Symposium on VLSI Technology. IEEE, T168–T169.

[32]

Xin Si, Jia-Jing Chen, Yung-Ning Tu, Wei-Hsing Huang, Jing-Hong Wang, Yen-Cheng Chiu, Wei-Chen Wei, Ssu-Yen Wu, Xiaoyu Sun, Rui Liu, et al. 2019. 24.5 A Twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 396–398.

[33]

Dave Steinkraus, Ian Buck, and P. Y. Simard. 2005. Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 1115–1120.

Digital Library

[34]

Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, and Huazhong Yang. 2020. An energy-efficient quantized and regularized training framework for processing-in-memory accelerators. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 325–330.

Digital Library

[35]

Rudolf Usselmann. 2018. Opencores Floating Point Unit. https://opencores.org/projects/fpu.

[36]

Shihui Yin, Zhewei Jiang, Jae-Sun Seo, and Mingoo Seok. 2020. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. IEEE Journal of Solid-State Circuits 55, 6 (2020), 1733–1743.

[37]

Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). 365–382.

[38]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).

[39]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).

Cited By

Sundar AViraraghavan JVijayakumar B(2024)Input-Conditioned Quantisation for ENOB Improvement in CIM ADC Columns Targeting Large-Length Partial SumsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335706471:6(2971-2975)Online publication date: Jun-2024
https://doi.org/10.1109/TCSII.2024.3357064
Borders WMadhavan ADaniels MGeorgiou VLueker-Boden MSantos TBraganca PStiles MMcClelland JHoskins B(2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
https://doi.org/10.1103/PhysRevApplied.21.054028
Caselli MBoni A(2023)Modelling and Optimization of a Mixed-Signal Accelerator for Deep Neural Networks2023 19th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD)10.1109/SMACD58065.2023.10192111(1-4)Online publication date: 3-Jul-2023
https://doi.org/10.1109/SMACD58065.2023.10192111
Show More Cited By

Index Terms

Dynamic Quantization Range Control for Analog-in-Memory Neural Networks Acceleration
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Very large scale integration design
    1. Analog and mixed-signal circuits

Recommendations

On Practical Approach to Uniform Quantization of Non-redundant Neural Networks
Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning
Abstract
The neural network quantization is highly desired procedure to perform before running the neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is ...
A greedy algorithm for quantizing neural networks

We propose a new computationally efficient method for quantizing the weights of pretrained neural networks that is general enough to handle both multi-layer perceptrons and convolutional neural networks. Our method deterministically quantizes layers in an ...
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Computer Vision – ECCV 2018
Abstract
Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 27, Issue 5

September 2022

274 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3540253

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 06 June 2022

Online AM: 24 February 2022

Accepted: 01 November 2021

Revised: 01 October 2021

Received: 01 July 2021

Published in TODAES Volume 27, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Flemish Government (AI Research Program) and KU Leuven

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
632
Total Downloads

Downloads (Last 12 months)166
Downloads (Last 6 weeks)21

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sundar AViraraghavan JVijayakumar B(2024)Input-Conditioned Quantisation for ENOB Improvement in CIM ADC Columns Targeting Large-Length Partial SumsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335706471:6(2971-2975)Online publication date: Jun-2024
https://doi.org/10.1109/TCSII.2024.3357064
Borders WMadhavan ADaniels MGeorgiou VLueker-Boden MSantos TBraganca PStiles MMcClelland JHoskins B(2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
https://doi.org/10.1103/PhysRevApplied.21.054028
Caselli MBoni A(2023)Modelling and Optimization of a Mixed-Signal Accelerator for Deep Neural Networks2023 19th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD)10.1109/SMACD58065.2023.10192111(1-4)Online publication date: 3-Jul-2023
https://doi.org/10.1109/SMACD58065.2023.10192111
Dadras ISarda GLaubeuf NBhattacharjee DMallik A(2023)AIMC Modeling and Parameter Tuning for Layer-Wise Optimal Operating Point in DNN InferenceIEEE Access10.1109/ACCESS.2023.330543211(87189-87199)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3305432
Laubeuf N(2022)Analog Compute in Memory and Breaking Digital Number Representations2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC54400.2022.9939611(1-2)Online publication date: 3-Oct-2022
https://doi.org/10.1109/VLSI-SoC54400.2022.9939611

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents