skip to main content
research-article

A Novel Architecture Design for Output Significance Aligned Flow with Adaptive Control in ReRAM-based Neural Network Accelerator

Published: 22 November 2022 Publication History

Abstract

Resistive-RAM-based (ReRAM-based) computing shows great potential on accelerating DNN inference by its highly parallel structure. Regrettably, computing accuracy in practical is much lower than expected due to the non-ideal ReRAM device. Conventional computing flow with fixed wordline activation scheme can effectively protect computing accuracy but at the cost of significant performance and energy savings reduction. For such embarrassment of accuracy, performance and energy, this article proposes a new Adaptive-Wordline-Activation control scheme (AWA-control) and combines it with a theoretical Output-Significance-Aligned computing flow (OSA-flow) to enable fine-grained control on output significance with distinct impact on final result. We demonstrate AWA-control-supported OSA-flow architecture with maximal compatibility to conventional crossbar by input retiming and weight remapping using shifting registers to enable the new flow. However, in contrast to the conventional computing architecture, the OSA-flow architecture shows the better capability to exploit data sparsity commonly seen in DNN models. So we also design a sparsity-aware OSA-flow architecture for further DNN speedup. Evaluation results show that OSA-flow architecture can provide significant performance improvement of 21.6×, and energy savings of 96.2% over conventional computing architecture with similar DNN accuracy.

References

[1]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715–731.
[2]
Soravit Changpinyo, Mark Sandler, and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arXiv:1702.06257. Retrieved from https://arxiv.org/abs/1702.06257.
[3]
Wei-Hao Chen, Kai-Xiang Li, Wei-Yu Lin, Kuo-Hsiang Hsu, Pin-Yi Li, Cheng-Han Yang, Cheng-Xin Xue, En-Yu Yang, Yen-Kai Chen, Yun-Sheng Chang, et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’18). IEEE, 494–496.
[4]
Xizi Chen, Jingbo Jiang, Jingyang Zhu, and Chi-Ying Tsui. 2018. A high-throughput and energy-efficient RRAM-based convolutional neural network using data encoding and dynamic quantization. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). IEEE, 123–128.
[5]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ACM SIGARCH Comput. Arch. News 44, 3 (2016), 27–39.
[6]
Andrea Fantini, Ludovic Goux, Robin Degraeve, D. J. Wouters, N. Raghavan, G. Kar, Attilio Belmonte, Y.-Y. Chen, Bogdan Govoreanu, and Malgorzata Jurczak. 2013. Intrinsic switching variability in HfO 2 RRAM. In Proceedings of the 5th IEEE International Memory Workshop. 30–33.
[7]
Ben Feinberg, Uday Kumar Reddy Vengalam, Nathan Whitehair, Shibo Wang, and Engin Ipek. 2018. Enabling scientific computing on memristive accelerators. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, 367–382.
[8]
Ben Feinberg, Shibo Wang, and Engin Ipek. 2018. Making memristive neural network accelerators reliable. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 52–65.
[9]
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115. Retrieved from https://arxiv.org/abs/1412.6115.
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[11]
K. C. Hsu, F. M. Lee, Y. Y. Lin, E. K. Lai, J. Y. Wu, D. Y. Lee, M. H. Lee, H. L. Lung, K. Y. Hsieh, and C. Y. Lu. 2015. A study of array resistance distribution and a novel operation algorithm for WOx ReRAM memory. In Proceedings of the International Conference on Solid State Devices and Materials (SSDM’15). 1168–1169.
[12]
Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle M. Grafals, Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, Jianhua Joshua Yang, and R. Stanley Williams. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the 53nd ACM/EDAC/IEEE Design Automation Conference (DAC’16). IEEE, 1–6.
[13]
W. Huangfu, L. Xia, M. Cheng, X. Yin, T. Tang, B. Li, K. Chakrabarty, Y. Xie, Y. Wang, and H. Yang. 2017. Computation-oriented fault-tolerance schemes for RRAM computing systems. In Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC’17).
[14]
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv:1602.07360. Retrieved from https://arxiv.org/abs/1602.07360.
[15]
Houxiang Ji, Linghao Song, Li Jiang, Hai Halen Li, and Yiran Chen. 2018. ReCom: An efficient resistive accelerator for compressed deep neural networks. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’18). IEEE, 237–240.
[16]
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv:1404.5997. Retrieved from https://arxiv.org/abs/1404.5997.
[17]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[18]
Seung Ryul Lee, Young-Bae Kim, Man Chang, Kyung Min Kim, Chang Bum Lee, Ji Hyun Hur, Gyeong-Su Park, Dongsoo Lee, Myoung-Jae Lee, Chang Jung Kim, et al. 2012. Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory. In Proceedings of the Symposium on VLSI Technology (VLSIT’12). 71–72.
[19]
Jilan Lin, Zhenhua Zhu, Yu Wang, and Yuan Xie. 2019. Learning the sparsity for ReRAM: Mapping and pruning sparse neural network for ReRAM based accelerator. In Proceedings of the 24th Asia and South Pacific Design Automation Conference. 639–644.
[20]
Meng-Yao Lin, Hsiang-Yun Cheng, Wei-Ting Lin, Tzu-Hsien Yang, I.-Ching Tseng, Chia-Lin Yang, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, and Meng-Fan Chang. 2018. DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). IEEE, 1–8.
[21]
Beiye Liu, Hai Li, Yiran Chen, Xin Li, Qing Wu, and Tingwen Huang. 2015. Vortex: Variation-aware training for memristor x-bar. In Proceedings of the 52nd Annual Design Automation Conference. 1–6.
[22]
Yun Long, Xueyuan She, and Saibal Mukhopadhyay. 2019. Design of reliable DNN accelerator with un-reliable ReRAM. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’19). IEEE, 1769–1774.
[23]
Gilberto Medeiros-Ribeiro, Frederick Perner, Richard Carter, Hisham Abdalla, Matthew D. Pickett, and R. Stanley Williams. 2011. Lognormal switching times for titanium dioxide bipolar memristors: Origin and resolution. Nanotechnology 22, 9 (2011), 095702.
[24]
Gabriel Molas, Gilbert Sassine, Cecile Nail, Diego Alfaro Robayo, Jean-François Nodin, Carlo Cagli, Jean Coignus, Philippe Blaise, and Etienne Nowak. 2018. Resistive memories (RRAM) variability: Challenges and solutions. ECS Transact. 86, 3 (2018), 35.
[25]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45, 2 (2017), 27–40.
[26]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Arch. News 44, 3 (2016), 14–26.
[27]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.
[28]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 541–552.
[29]
Chengning Wang, Dan Feng, Wei Tong, Jingning Liu, Zheng Li, Jiayi Chang, Yang Zhang, Bing Wu, Jie Xu, Wei Zhao, et al. 2019. Cross-point resistive memory: Nonideal properties and solutions. ACM Trans. Des. Autom. Electr. Syst. 24, 4 (2019), 1–37.
[30]
Peiqi Wang, Yu Ji, Chi Hong, Yongqiang Lyu, Dongsheng Wang, and Yuan Xie. 2018. SNrram: An efficient sparse neural network computation architecture based on resistive random-access memory. In Proceedings of the 55th ACM/ESDA/IEEE Design Automation Conference (DAC’18). IEEE, 1–6.
[31]
Cheng-Xin Xue, Wei-Hao Chen, Je-Syu Liu, Jia-Fang Li, Wei-Yu Lin, Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, et al. 2019. 24.1: A 1Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN based AI edge processors. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’19). IEEE, 388–390.
[32]
Cheng-Xin Xue, Tsung-Yuan Huang, Je-Syu Liu, Ting-Wei Chang, Hui-Yao Kao, Jing-Hong Wang, Ta-Wei Liu, Shih-Ying Wei, Sheng-Po Huang, Wei-Chen Wei, et al. 2020. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’20). IEEE, 244–246.
[33]
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I-Ching Tseng, Han-Wen Hu, Hung-Sheng Chang, and Hsiang-Pang Li. 2019. Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks. In Proceedings of the 46th International Symposium on Computer Architecture. 236–249.
[34]
B. Zhang, N. Uysal, D. Fan, and R. Ewetz. 2019. Handling stuck-at-fault defects using matrix transformation for robust inference of DNNs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 39, 10 (2019), 2448–2460.
[35]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-x: An accelerator for sparse neural networks. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–12.

Cited By

View all
  • (2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024

Index Terms

  1. A Novel Architecture Design for Output Significance Aligned Flow with Adaptive Control in ReRAM-based Neural Network Accelerator

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Transactions on Design Automation of Electronic Systems
          ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 6
          November 2022
          285 pages
          ISSN:1084-4309
          EISSN:1557-7309
          DOI:10.1145/3544939
          Issue’s Table of Contents

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Journal Family

          Publication History

          Published: 22 November 2022
          Online AM: 23 May 2022
          Accepted: 06 January 2022
          Revised: 04 December 2021
          Received: 10 October 2021
          Published in TODAES Volume 27, Issue 6

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. ISA-flow
          2. OSA-flow
          3. FWA-control
          4. AWA-control
          5. sparsity exploitation

          Qualifiers

          • Research-article
          • Refereed

          Funding Sources

          • National Key Research and Development Program of China
          • National Natural Science Foundation of China

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)59
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 05 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024

          View Options

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          Full Text

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media