skip to main content
research-article

Dynamic Quantization Range Control for Analog-in-Memory Neural Networks Acceleration

Published: 06 June 2022 Publication History

Abstract

Analog in Memory Computing (AiMC) based neural network acceleration is a promising solution to increase the energy efficiency of deep neural networks deployment. However, the quantization requirements of these analog systems are not compatible with state-of-the-art neural network quantization techniques. Indeed, while the quantization of the weights and activations is considered by modern deep neural network quantization techniques, AiMC accelerators also impose the quantization of each Matrix Vector Multiplication (MVM) result. In most demonstrated AiMC implementations, the quantization range of MVM results is considered a fixed parameter of the accelerator. This work demonstrates that dynamic control over this quantization range is possible but also desirable for analog neural networks acceleration. An AiMC compatible quantization flow coupled with a hardware aware quantization range driving technique is introduced to fully exploit these dynamic ranges. Using CIFAR-10 and ImageNet as benchmarks, the proposed solution results in networks that are both more accurate and more robust to the inherent vulnerability of analog circuits than fixed quantization range based approaches.
Appendix
A Hyperparameters Tuning

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In 12th \(\lbrace\)USENIX\(\rbrace\) Symposium on Operating Systems Design and Implementation (\(\lbrace\)OSDI\(\rbrace\) 16). 265–283.
[2]
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2623–2631.
[3]
Dario Amodei, Danny Hernandez, Girish Sastry, Jack Clark, Greg Brockman, and Ilya Sutskever. [n.d.]. AI and Compute. https://openai.com/blog/ai-and-compute/. Accessed: 2021-05-04.
[4]
Chaim Baskin, Natan Liss, Yoav Chai, Evgenii Zheltonozhskii, Eli Schwartz, Raja Giryes, Avi Mendelson, and Alexander M. Bronstein. 2018. NICE: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018).
[5]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[6]
Debjyoti Bhattacharjee, Nathan Laubeuf, Stefan Cosemans, Ioannis Papistas, Arindam Mallik, Peter Debacker, Myung Hee Na, and Diederik Verkest. 2021. Design-technology space exploration for energy efficient AiMC-based inference acceleration. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5.
[7]
Avishek Biswas and Anantha P. Chandrakasan. 2018. CONV-SRAM: An energy-efficient SRAM with in-memory dot-product computation for low-power convolutional neural networks. IEEE Journal of Solid-State Circuits 54, 1 (2018), 217–230.
[8]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).
[9]
Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8–20.
[10]
Jonas Doevenspeck, Peter Vrancx, Nathan Laubeuf, Arindam Mallik, Peter Debacker, Diederik Verkest, Rudy Lauwereins, and Wim Dehaene. 2021. Noise tolerant ternary weight deep neural networks for analog in-memory inference. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[11]
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. 2020. Learned step size quantization. In ICLR. OpenReview.net.
[12]
Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, and Deming Chen. 2020. VecQ: Minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. (2020).
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[14]
Geoffrey Hinton. 2012. Neural networks for machine learning.Coursera (video lectures).
[15]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
[16]
Sambhav Jain, Albert Gural, Michael Wu, and Chris Dick. 2020. Trained quantization thresholds for accurate and efficient fixed-point inference of deep neural networks. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 112–128.
[17]
Zhewei Jiang, Shihui Yin, Jae-Sun Seo, and Mingoo Seok. 2020. C3SRAM: An in-memory-computing SRAM macro based on robust capacitive coupling computing mechanism. IEEE Journal of Solid-State Circuits 55, 7 (2020), 1888–1897.
[18]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.
[19]
Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
[20]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto.
[21]
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).
[22]
Jeffrey L. McKinstry, Steven K. Esser, Rathinakumar Appuswamy, Deepika Bablani, John V. Arthur, Izzet B. Yildiz, and Dharmendra S. Modha. 2018. Discovering low-precision networks close to full-precision networks for efficient embedded inference. arXiv preprint arXiv:1809.04191 (2018).
[23]
Boris Murmann. 2020. Mixed-signal computing for deep neural network inference. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 1 (2020), 3–13.
[24]
Ioannis Papistas, Stefan Cosemans, Bram Rooseleer, Jonas Doevenspeck, Myung-Hee Na, Arindam Mallik, Peter Debacker, and Diederik Verkest. 2021. A \(22nm\), \(1540 \: TOP/s/W\), \(12:1 \: TOP/s/mm^2\) in-memory analog matrix-vector-multiplier for DNN acceleration. In 2021 IEEE Custom Integrated Circuits Conference (CICC). IEEE.
[25]
Alessandro Pappalardo. [n.d.]. Xilinx/brevitas.
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019), 8026–8037.
[27]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10428–10436.
[28]
Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.
[29]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252.
[30]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[31]
Abu Sebastian, Irem Boybat, Martino Dazzi, Iason Giannopoulos, V. Jonnalagadda, Vinay Joshi, Geethan Karunaratne, Benedikt Kersting, Riduan Khaddam-Aljameh, S. R. Nandakumar, et al. 2019. Computational memory-based inference and training of deep neural networks. In 2019 Symposium on VLSI Technology. IEEE, T168–T169.
[32]
Xin Si, Jia-Jing Chen, Yung-Ning Tu, Wei-Hsing Huang, Jing-Hong Wang, Yen-Cheng Chiu, Wei-Chen Wei, Ssu-Yen Wu, Xiaoyu Sun, Rui Liu, et al. 2019. 24.5 A Twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 396–398.
[33]
Dave Steinkraus, Ian Buck, and P. Y. Simard. 2005. Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 1115–1120.
[34]
Hanbo Sun, Zhenhua Zhu, Yi Cai, Xiaoming Chen, Yu Wang, and Huazhong Yang. 2020. An energy-efficient quantized and regularized training framework for processing-in-memory accelerators. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 325–330.
[35]
Rudolf Usselmann. 2018. Opencores Floating Point Unit. https://opencores.org/projects/fpu.
[36]
Shihui Yin, Zhewei Jiang, Jae-Sun Seo, and Mingoo Seok. 2020. XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks. IEEE Journal of Solid-State Circuits 55, 6 (2020), 1733–1743.
[37]
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV). 365–382.
[38]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
[39]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).

Cited By

View all
  • (2024)Input-Conditioned Quantisation for ENOB Improvement in CIM ADC Columns Targeting Large-Length Partial SumsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335706471:6(2971-2975)Online publication date: Jun-2024
  • (2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
  • (2023)Modelling and Optimization of a Mixed-Signal Accelerator for Deep Neural Networks2023 19th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD)10.1109/SMACD58065.2023.10192111(1-4)Online publication date: 3-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 5
September 2022
274 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3540253
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 06 June 2022
Online AM: 24 February 2022
Accepted: 01 November 2021
Revised: 01 October 2021
Received: 01 July 2021
Published in TODAES Volume 27, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural networks
  2. quantization
  3. in-memory-computing

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Flemish Government (AI Research Program) and KU Leuven

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)166
  • Downloads (Last 6 weeks)21
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Input-Conditioned Quantisation for ENOB Improvement in CIM ADC Columns Targeting Large-Length Partial SumsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.335706471:6(2971-2975)Online publication date: Jun-2024
  • (2024)Measurement-driven neural-network training for integrated magnetic tunnel junction arraysPhysical Review Applied10.1103/PhysRevApplied.21.05402821:5Online publication date: 14-May-2024
  • (2023)Modelling and Optimization of a Mixed-Signal Accelerator for Deep Neural Networks2023 19th International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD)10.1109/SMACD58065.2023.10192111(1-4)Online publication date: 3-Jul-2023
  • (2023)AIMC Modeling and Parameter Tuning for Layer-Wise Optimal Operating Point in DNN InferenceIEEE Access10.1109/ACCESS.2023.330543211(87189-87199)Online publication date: 2023
  • (2022)Analog Compute in Memory and Breaking Digital Number Representations2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC54400.2022.9939611(1-2)Online publication date: 3-Oct-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media