skip to main content
10.1145/3394885.3431633acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems

Published: 29 January 2021 Publication History

Abstract

Memristor based Processing-In-Memory (PIM) systems give alternative solutions to boost the computing energy efficiency of Convolutional Neural Network (CNN) based algorithms. However, Analog-to-Digital Converters' (ADCs) high interface costs and the limited size of the memristor crossbars make it challenging to map CNN models onto PIM systems with both high accuracy and high energy efficiency. Besides, it takes a long time to simulate the performance of large-scale PIM systems, resulting in unacceptable development time for the PIM system. To address these problems, we propose a reliability-aware training framework and a behavior-level modeling tool (MNSIM 2.0) for PIM accelerators. The proposed reliability-aware training framework, containing network splitting/merging analysis and a PIM-based non-uniform activation quantization scheme, can improve the energy efficiency by reducing the ADC resolution requirements in memristor crossbars. Moreover, MNSIM 2.0 provides a general modeling method for PIM architecture design and computation data flow; it can evaluate both accuracy and hardware performance within a short time. Experiments based on MNSIM 2.0 show that the reliability-aware training framework can improve 3.4x energy efficiency of PIM accelerators with little accuracy loss. The equivalent energy efficiency is 9.02 TOPS/W, nearly 2.6~4.2x compared with the existing work. We also evaluate more case studies of MNSIM 2.0, which help us balance the trade-off between accuracy and hardware performance.

References

[1]
K. Beckmann et al. 2016. Nanoscale hafnium oxide rram devices exhibit pulse dependent behavior and multi-level resistance capability. Mrs Advances (2016).
[2]
Y. Cai et al. 2018. Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification. In DAC. IEEE.
[3]
Y. Cai et al. 2019. Low Bit-width Convolutional Neural Network on RRAM. IEEE TCAD (2019), 1--1.
[4]
C. Y. Chen et al. 2015. RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme. IEEE TC (2015).
[5]
H. Chen et al. 2018. A >3GHz ERBW 1.1GS/S 8B Two-Sten SAR ADC with Recursive-Weight DAC. In VLSI-Circuits, 2018. 97--98.
[6]
M. Cheng et al. 2017. TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks. In DAC, 2017. ACM.
[7]
P. Chi et al. 2016. PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory. In ISCA, 2016.
[8]
K. D. Choo, J. Bell, and M. P. Flynn. 2016. 27.3 Area-efficient 1GS/s 6b SAR ADC with charge-injection-cell-based DAC. In ISSCC, 2016. 460--461.
[9]
P. Gu et al. 2015. Technological exploration of RRAM crossbar array for matrix-vector multiplication. In ASPDAC, 2015. 106--111.
[10]
Fatih Gül. 2019. Addressing the sneak-path problem in crossbar RRAM devices using memristor-based one Schottky diode-one resistor array. Results in Physics (2019).
[11]
K. He et al. 2016. Deep Residual Learning for Image Recognition. In CVPR, 2016.
[12]
Z. He et al. 2019. Noise Injection Adaption: End-to-End ReRAM Crossbar Non-ideal Effect Adaption for Neural Network Mapping. In DAC, 2019. 1--6.
[13]
W. Huangfu et al. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In ASPDAC. IEEE.
[14]
Kaggle et al. 2014. CIFAR-10 - Object Recognition in Images. website. https://www.kaggle.com/c/cifar-10.
[15]
S. Karen et al. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).
[16]
G. Krishnan et al. 2020. Interconnect-Aware Area and Energy Optimization for In-Memory Acceleration of DNNs. IEEE Design and Test (2020), 1--1.
[17]
L. Kull et al. 2017. 28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET. In ISSCC, 2017. 474--475.
[18]
Y. LeCun et al. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998. 2278--2324.
[19]
S. R. Lee et al. 2012. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In VLSIT.
[20]
B. Li et al. 2014. ICE: Inline calibration for memristor crossbar-based computing engine. In DATE.
[21]
D. Lin et al. 2016. Fixed point quantization of deep convolutional networks. In ICML. 2849--2858.
[22]
M. Lin et al. 2018. DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning. In ICCAD, 2018. 1--8.
[23]
Sumit K. Mandal et al. 2019. Analytical Performance Models for NoCs with Multiple Priority Traffic Classes. ACM Trans. Embed. Comput. Syst. 18, 5s (2019).
[24]
B. Nasri et al. 2017. A 700 μW 1GS/s 4-bit folding-flash ADC in 65nm CMOS for wideband wireless communications. In ISCAS, 2017. 1--4.
[25]
X. Peng et al. 2019. DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies. In IEDM, 2019.
[26]
A. Shafiee et al. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-situ Analog Arithmetic in Crossbars. In ISCA, 2016.
[27]
H. Sun et al. 2020. An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators. In ASPDAC, 2020. 1--6.
[28]
S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: an enhanced cache access and cycle time model. JSSC, 1996 31, 5 (1996), 677--688.
[29]
W. Wu et al. 2018. Suppress variations of analog resistive memory for neuromorphic computing by localizing Vo formation. Journal of Applied Physics.
[30]
L. Xia et al. 2017. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In DAC.
[31]
L. Xia et al. 2018. MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System. TCAD, 2018 (2018).
[32]
S. Yu et al. 2012. A neuromorphic visual system using RRAM synaptic devices with Sub-pJ energy and tolerance to variability: Experimental characterization and large-scale modeling. In IEDM.
[33]
Z. Zhu et al. 2019. A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM. In DAC, 2019. 1--6.
[34]
Z. Zhu et al. 2020. MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems. In GLSVLSI.

Cited By

View all
  • (2024)Reshape and adapt for output quantization (raoq)Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694493(58739-58762)Online publication date: 21-Jul-2024
  • (2021)Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643448(1-9)Online publication date: 1-Nov-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
January 2021
930 pages
ISBN:9781450379991
DOI:10.1145/3394885
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ASPDAC '21
Sponsor:

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reshape and adapt for output quantization (raoq)Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694493(58739-58762)Online publication date: 21-Jul-2024
  • (2021)Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643448(1-9)Online publication date: 1-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media