research-article

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems

Authors:

Huazhong YangAuthors Info & Claims

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Pages 847 - 852

https://doi.org/10.1145/3394885.3431633

Published: 29 January 2021 Publication History

Abstract

Memristor based Processing-In-Memory (PIM) systems give alternative solutions to boost the computing energy efficiency of Convolutional Neural Network (CNN) based algorithms. However, Analog-to-Digital Converters' (ADCs) high interface costs and the limited size of the memristor crossbars make it challenging to map CNN models onto PIM systems with both high accuracy and high energy efficiency. Besides, it takes a long time to simulate the performance of large-scale PIM systems, resulting in unacceptable development time for the PIM system. To address these problems, we propose a reliability-aware training framework and a behavior-level modeling tool (MNSIM 2.0) for PIM accelerators. The proposed reliability-aware training framework, containing network splitting/merging analysis and a PIM-based non-uniform activation quantization scheme, can improve the energy efficiency by reducing the ADC resolution requirements in memristor crossbars. Moreover, MNSIM 2.0 provides a general modeling method for PIM architecture design and computation data flow; it can evaluate both accuracy and hardware performance within a short time. Experiments based on MNSIM 2.0 show that the reliability-aware training framework can improve 3.4x energy efficiency of PIM accelerators with little accuracy loss. The equivalent energy efficiency is 9.02 TOPS/W, nearly 2.6~4.2x compared with the existing work. We also evaluate more case studies of MNSIM 2.0, which help us balance the trade-off between accuracy and hardware performance.

References

[1]

K. Beckmann et al. 2016. Nanoscale hafnium oxide rram devices exhibit pulse dependent behavior and multi-level resistance capability. Mrs Advances (2016).

[2]

Y. Cai et al. 2018. Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification. In DAC. IEEE.

[3]

Y. Cai et al. 2019. Low Bit-width Convolutional Neural Network on RRAM. IEEE TCAD (2019), 1--1.

[4]

C. Y. Chen et al. 2015. RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme. IEEE TC (2015).

[5]

H. Chen et al. 2018. A >3GHz ERBW 1.1GS/S 8B Two-Sten SAR ADC with Recursive-Weight DAC. In VLSI-Circuits, 2018. 97--98.

[6]

M. Cheng et al. 2017. TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks. In DAC, 2017. ACM.

Digital Library

[7]

P. Chi et al. 2016. PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory. In ISCA, 2016.

[8]

K. D. Choo, J. Bell, and M. P. Flynn. 2016. 27.3 Area-efficient 1GS/s 6b SAR ADC with charge-injection-cell-based DAC. In ISSCC, 2016. 460--461.

[9]

P. Gu et al. 2015. Technological exploration of RRAM crossbar array for matrix-vector multiplication. In ASPDAC, 2015. 106--111.

[10]

Fatih Gül. 2019. Addressing the sneak-path problem in crossbar RRAM devices using memristor-based one Schottky diode-one resistor array. Results in Physics (2019).

[11]

K. He et al. 2016. Deep Residual Learning for Image Recognition. In CVPR, 2016.

[12]

Z. He et al. 2019. Noise Injection Adaption: End-to-End ReRAM Crossbar Non-ideal Effect Adaption for Neural Network Mapping. In DAC, 2019. 1--6.

[13]

W. Huangfu et al. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In ASPDAC. IEEE.

[14]

Kaggle et al. 2014. CIFAR-10 - Object Recognition in Images. website. https://www.kaggle.com/c/cifar-10.

[15]

S. Karen et al. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science (2014).

[16]

G. Krishnan et al. 2020. Interconnect-Aware Area and Energy Optimization for In-Memory Acceleration of DNNs. IEEE Design and Test (2020), 1--1.

[17]

L. Kull et al. 2017. 28.5 A 10b 1.5GS/s pipelined-SAR ADC with background second-stage common-mode regulation and offset calibration in 14nm CMOS FinFET. In ISSCC, 2017. 474--475.

[18]

Y. LeCun et al. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998. 2278--2324.

[19]

S. R. Lee et al. 2012. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In VLSIT.

[20]

B. Li et al. 2014. ICE: Inline calibration for memristor crossbar-based computing engine. In DATE.

[21]

D. Lin et al. 2016. Fixed point quantization of deep convolutional networks. In ICML. 2849--2858.

[22]

M. Lin et al. 2018. DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning. In ICCAD, 2018. 1--8.

[23]

Sumit K. Mandal et al. 2019. Analytical Performance Models for NoCs with Multiple Priority Traffic Classes. ACM Trans. Embed. Comput. Syst. 18, 5s (2019).

Digital Library

[24]

B. Nasri et al. 2017. A 700 μW 1GS/s 4-bit folding-flash ADC in 65nm CMOS for wideband wireless communications. In ISCAS, 2017. 1--4.

[25]

X. Peng et al. 2019. DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies. In IEDM, 2019.

[26]

A. Shafiee et al. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-situ Analog Arithmetic in Crossbars. In ISCA, 2016.

[27]

H. Sun et al. 2020. An Energy-Efficient Quantized and Regularized Training Framework For Processing-In-Memory Accelerators. In ASPDAC, 2020. 1--6.

[28]

S. J. E. Wilton and N. P. Jouppi. 1996. CACTI: an enhanced cache access and cycle time model. JSSC, 1996 31, 5 (1996), 677--688.

[29]

W. Wu et al. 2018. Suppress variations of analog resistive memory for neuromorphic computing by localizing Vo formation. Journal of Applied Physics.

[30]

L. Xia et al. 2017. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In DAC.

[31]

L. Xia et al. 2018. MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System. TCAD, 2018 (2018).

[32]

S. Yu et al. 2012. A neuromorphic visual system using RRAM synaptic devices with Sub-pJ energy and tolerance to variability: Experimental characterization and large-scale modeling. In IEDM.

[33]

Z. Zhu et al. 2019. A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM. In DAC, 2019. 1--6.

[34]

Z. Zhu et al. 2020. MNSIM 2.0: A Behavior-Level Modeling Tool for Memristor-based Neuromorphic Computing Systems. In GLSVLSI.

Cited By

Zhang BChen CVerma NSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Reshape and adapt for output quantization (raoq)Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694493(58739-58762)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694493
Mandal STong JAyoub RKishinevsky MAbousamra AOgras U(2021)Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643448(1-9)Online publication date: 1-Nov-2021
https://doi.org/10.1109/ICCAD51958.2021.9643448

Recommendations

Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and Workshops

Phase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Energy-aware memory allocation in heterogeneous non-volatile memory systems
ISLPED '03: Proceedings of the 2003 international symposium on Low power electronics and design

Memory systems consume a significant portion of power in hand-held embedded systems. So far, low-power memory techniques have addressed the power consumption when the system is turned on. In this paper, we consider data retention energy during the power-...
A workload-aware flash translation layer enhancing performance and lifespan of TLC/SLC dual-mode flash memory in embedded systems

Similar to traditional NAND flash memory, triple-level cell (TLC) flash memory is used as secondary storage to meet the fast growing demands on storage capacity. TLC flash memory exhibits attractive features such as shock resistance, high density, low ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

January 2021

930 pages

ISBN:9781450379991

DOI:10.1145/3394885

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ASPDAC '21

Sponsor:

SIGDA

ASPDAC '21: 26th Asia and South Pacific Design Automation Conference

January 18 - 21, 2021

Tokyo, Japan

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
178
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang BChen CVerma NSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Reshape and adapt for output quantization (raoq)Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694493(58739-58762)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694493
Mandal STong JAyoub RKishinevsky MAbousamra AOgras U(2021)Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643448(1-9)Online publication date: 1-Nov-2021
https://doi.org/10.1109/ICCAD51958.2021.9643448

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten