research-article

A Novel Architecture Design for Output Significance Aligned Flow with Adaptive Control in ReRAM-based Neural Network Accelerator

Authors:

Yiran ChenAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 27, Issue 6

Article No.: 57, Pages 1 - 22

https://doi.org/10.1145/3510819

Published: 22 November 2022 Publication History

Abstract

Resistive-RAM-based (ReRAM-based) computing shows great potential on accelerating DNN inference by its highly parallel structure. Regrettably, computing accuracy in practical is much lower than expected due to the non-ideal ReRAM device. Conventional computing flow with fixed wordline activation scheme can effectively protect computing accuracy but at the cost of significant performance and energy savings reduction. For such embarrassment of accuracy, performance and energy, this article proposes a new Adaptive-Wordline-Activation control scheme (AWA-control) and combines it with a theoretical Output-Significance-Aligned computing flow (OSA-flow) to enable fine-grained control on output significance with distinct impact on final result. We demonstrate AWA-control-supported OSA-flow architecture with maximal compatibility to conventional crossbar by input retiming and weight remapping using shifting registers to enable the new flow. However, in contrast to the conventional computing architecture, the OSA-flow architecture shows the better capability to exploit data sparsity commonly seen in DNN models. So we also design a sparsity-aware OSA-flow architecture for further DNN speedup. Evaluation results show that OSA-flow architecture can provide significant performance improvement of 21.6×, and energy savings of 96.2% over conventional computing architecture with similar DNN accuracy.

References

[1]

Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715–731.

Digital Library

[2]

Soravit Changpinyo, Mark Sandler, and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arXiv:1702.06257. Retrieved from https://arxiv.org/abs/1702.06257.

[3]

Wei-Hao Chen, Kai-Xiang Li, Wei-Yu Lin, Kuo-Hsiang Hsu, Pin-Yi Li, Cheng-Han Yang, Cheng-Xin Xue, En-Yu Yang, Yen-Kai Chen, Yun-Sheng Chang, et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’18). IEEE, 494–496.

[4]

Xizi Chen, Jingbo Jiang, Jingyang Zhu, and Chi-Ying Tsui. 2018. A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). IEEE, 123–128.

Digital Library

[5]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ACM SIGARCH Comput. Arch. News 44, 3 (2016), 27–39.

Digital Library

[6]

Andrea Fantini, Ludovic Goux, Robin Degraeve, D. J. Wouters, N. Raghavan, G. Kar, Attilio Belmonte, Y.-Y. Chen, Bogdan Govoreanu, and Malgorzata Jurczak. 2013. Intrinsic switching variability in HfO 2 RRAM. In Proceedings of the 5th IEEE International Memory Workshop. 30–33.

[7]

Ben Feinberg, Uday Kumar Reddy Vengalam, Nathan Whitehair, Shibo Wang, and Engin Ipek. 2018. Enabling scientific computing on memristive accelerators. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, 367–382.

Digital Library

[8]

Ben Feinberg, Shibo Wang, and Engin Ipek. 2018. Making memristive neural network accelerators reliable. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 52–65.

[9]

Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115. Retrieved from https://arxiv.org/abs/1412.6115.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[11]

K. C. Hsu, F. M. Lee, Y. Y. Lin, E. K. Lai, J. Y. Wu, D. Y. Lee, M. H. Lee, H. L. Lung, K. Y. Hsieh, and C. Y. Lu. 2015. A study of array resistance distribution and a novel operation algorithm for WOx ReRAM memory. In Proceedings of the International Conference on Solid State Devices and Materials (SSDM’15). 1168–1169.

[12]

Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M. Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R. Stanley Williams. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the 53nd ACM/EDAC/IEEE Design Automation Conference (DAC’16). IEEE, 1–6.

Digital Library

[13]

W. Huangfu, L. Xia, M. Cheng, X. Yin, T. Tang, B. Li, K. Chakrabarty, Y. Xie, Y. Wang, and H. Yang. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17).

Digital Library

[14]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv:1602.07360. Retrieved from https://arxiv.org/abs/1602.07360.

[15]

Houxiang Ji, Linghao Song, Li Jiang, Hai Halen Li, and Yiran Chen. 2018. ReCom: An efficient resistive accelerator for compressed deep neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’18). IEEE, 237–240.

[16]

Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997. Retrieved from https://arxiv.org/abs/1404.5997.

[17]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[18]

Seung Ryul Lee, Young-Bae Kim, Man Chang, Kyung Min Kim, Chang Bum Lee, Ji Hyun Hur, Gyeong-Su Park, Dongsoo Lee, Myoung-Jae Lee, Chang Jung Kim, et al. 2012. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In Proceedings of the Symposium on VLSI Technology (VLSIT’12). 71–72.

[19]

Jilan Lin, Zhenhua Zhu, Yu Wang, and Yuan Xie. 2019. Learning the sparsity for ReRAM: Mapping and pruning sparse neural network for ReRAM based accelerator. In Proceedings of the 24th Asia and South Pacific Design Automation Conference. 639–644.

Digital Library

[20]

Meng-Yao Lin, Hsiang-Yun Cheng, Wei-Ting Lin, Tzu-Hsien Yang, I.-Ching Tseng, Chia-Lin Yang, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, and Meng-Fan Chang. 2018. DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). IEEE, 1–8.

Digital Library

[21]

Beiye Liu, Hai Li, Yiran Chen, Xin Li, Qing Wu, and Tingwen Huang. 2015. Vortex: Variation-aware training for memristor x-bar. In Proceedings of the 52nd Annual Design Automation Conference. 1–6.

Digital Library

[22]

Yun Long, Xueyuan She, and Saibal Mukhopadhyay. 2019. Design of reliable DNN accelerator with un-reliable ReRAM. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’19). IEEE, 1769–1774.

[23]

Gilberto Medeiros-Ribeiro, Frederick Perner, Richard Carter, Hisham Abdalla, Matthew D. Pickett, and R. Stanley Williams. 2011. Lognormal switching times for titanium dioxide bipolar memristors: Origin and resolution. Nanotechnology 22, 9 (2011), 095702.

[24]

Gabriel Molas, Gilbert Sassine, Cecile Nail, Diego Alfaro Robayo, Jean-François Nodin, Carlo Cagli, Jean Coignus, Philippe Blaise, and Etienne Nowak. 2018. Resistive memories (RRAM) variability: Challenges and solutions. ECS Transact. 86, 3 (2018), 35.

[25]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45, 2 (2017), 27–40.

Digital Library

[26]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Arch. News 44, 3 (2016), 14–26.

Digital Library

[27]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.

[28]

Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 541–552.

[29]

Chengning Wang, Dan Feng, Wei Tong, Jingning Liu, Zheng Li, Jiayi Chang, Yang Zhang, Bing Wu, Jie Xu, Wei Zhao, et al. 2019. Cross-point resistive memory: Nonideal properties and solutions. ACM Trans. Des. Autom. Electr. Syst. 24, 4 (2019), 1–37.

Digital Library

[30]

Peiqi Wang, Yu Ji, Chi Hong, Yongqiang Lyu, Dongsheng Wang, and Yuan Xie. 2018. SNrram: An efficient sparse neural network computation architecture based on resistive random-access memory. In Proceedings of the 55th ACM/ESDA/IEEE Design Automation Conference (DAC’18). IEEE, 1–6.

Digital Library

[31]

Cheng-Xin Xue, Wei-Hao Chen, Je-Syu Liu, Jia-Fang Li, Wei-Yu Lin, Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, et al. 2019. 24.1: A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN based AI edge processors. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’19). IEEE, 388–390.

[32]

Cheng-Xin Xue, Tsung-Yuan Huang, Je-Syu Liu, Ting-Wei Chang, Hui-Yao Kao, Jing-Hong Wang, Ta-Wei Liu, Shih-Ying Wei, Sheng-Po Huang, Wei-Chen Wei, et al. 2020. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’20). IEEE, 244–246.

[33]

Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li. 2019. Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks. In Proceedings of the 46th International Symposium on Computer Architecture. 236–249.

Digital Library

[34]

B. Zhang, N. Uysal, D. Fan, and R. Ewetz. 2019. Handling stuck-at-fault defects using matrix transformation for robust inference of DNNs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 39, 10 (2019), 2448–2460.

[35]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–12.

Cited By

Joseph TT.S B(2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1145/3699512

Index Terms

A Novel Architecture Design for Output Significance Aligned Flow with Adaptive Control in ReRAM-based Neural Network Accelerator
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

A Novel ReRAM-Based Processing-in-Memory Architecture for Graph Traversal
Special Issue on NVM and Storage

Graph algorithms such as graph traversal have been gaining ever-increasing importance in the era of big data. However, graph processing on traditional architectures issues many random and irregular memory accesses, leading to a huge number of data ...
3A-ReRAM: Adaptive Activation Accumulation in ReRAM-Based CNN Accelerator
ReRAM-based computing is good at accelerating convolutional neural network (CNN) inference due to its high computing parallelism, but its rigid crossbar structure may become less efficient in the face of the random data sparsity abundant in CNNs. In this ...
Memristor-Based (ReRAM) Data Memory Architecture in ASIP Design
DSD '13: Proceedings of the 2013 Euromicro Conference on Digital System Design

Recently, multiple non-volatile emerging memories (NVMs) have been proposed and show promising properties to replace SRAM-based memories in future SoCs. However, these new emerging memories, such as STT-MRAM and ReRAM, provide new challenges for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 27, Issue 6

November 2022

285 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3544939

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 22 November 2022

Online AM: 23 May 2022

Accepted: 06 January 2022

Revised: 04 December 2021

Received: 10 October 2021

Published in TODAES Volume 27, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key Research and Development Program of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
319
Total Downloads

Downloads (Last 12 months)59
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Joseph TT.S B(2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1145/3699512

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents