research-article

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

Authors:

Rafael Fão De Moura,

Luigi CarroAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems, Volume 17, Issue 1

Article No.: 7, Pages 1 - 19

https://doi.org/10.1145/3617894

Published: 27 January 2024 Publication History

Abstract

As the massive usage of artificial intelligence techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM devices to compute matrix-vector multiplication in O(1) time complexity is a promising approach, but it is true that these implementations often fail to cover the diversity of non-linearities required for modern NN applications. In this work, we propose a novel approach where Resistive RAMs themselves can be reprogrammed to compute not only the required matrix multiplications but also the activation functions, Softmax, and pooling layers, reducing energy in complex NNs. This approach offers more versatility for researching novel NN layouts compared to custom logic. Results show that our device outperforms analog and digital field-programmable approaches by up to 8.5× in experiments on real-world human activity recognition and language modeling datasets with convolutional neural network, generative pre-trained Transformer, and long short-term memory models.

References

[1]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[2]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.

[3]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.

[4]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.

[5]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[7]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.

[8]

Mohammed A. Zidan, John Paul Strachan, and Wei D. Lu. 2018. The future of electronics based on memristive systems. Nature Electronics 1, 1 (2018), 22–29.

[9]

Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-Mei Hwu, and Kaushik Roy. 2020. Panther: A programmable architecture for neural network training harnessing energy-efficient reram. IEEE Transactions on Computers 69, 8 (2020), 1128–1142.

Digital Library

[10]

Tayfun Gokmen and Yurii Vlasov. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.

[11]

Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. 2019. FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 733–747.

Digital Library

[12]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.

Digital Library

[13]

Tao Luo, Shaoli Liu, Ling Li, Yuqing Wang, Shijin Zhang, Tianshi Chen, Zhiwei Xu, Olivier Temam, and Yunji Chen. 2016. DaDianNao: A neural network supercomputer. IEEE Transactions on Computers 66, 1 (2016), 73–88.

Digital Library

[14]

Yun Long, Taesik Na, and Saibal Mukhopadhyay. 2018. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2781–2794.

[15]

Jianhui Han, He Liu, Mingyu Wang, Zhaolin Li, and Youhui Zhang. 2019. ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory. IEEE Transactions on Parallel and Distributed Systems 31, 6 (2019), 1328–1342.

Digital Library

[16]

Ligang Gao, I.-Ting Wang, Pai-Yu Chen, Sarma Vrudhula, Jae-Sun Seo, Yu Cao, Tuo-Hung Hou, and Shimeng Yu. 2015. Fully parallel write/read in resistive synaptic array for accelerating on-chip learning. Nanotechnology 26, 45 (2015), 455204.

[17]

Alessandro Grossi, Elisa Vianello, Cristian Zambelli, Pablo Royer, Jean-Philippe Noel, Bastien Giraud, Luca Perniola, Piero Olivo, and Etienne Nowak. 2018. Experimental investigation of 4-kb RRAM arrays programming conditions suitable for TCAM. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2599–2607.

[18]

Kazybek Adam, Kamilya Smagulova, and Alex James. 2021. Generalised analog LSTMs recurrent modules for neural computing. Frontiers in Computational Neuroscience 15 (2021), 85.

[19]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 75–84.

Digital Library

[20]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 11–20.

Digital Library

[21]

Jennifer Hasler. 2022. The potential of SoC FPAAs for emerging ultra-low-power machine learning. Journal of Low Power Electronics and Applications 12, 2 (2022), 33.

[22]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Computer Architecture News 44, 3 (2016), 27–39.

Digital Library

[23]

Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. 2015. RRAM-based analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 1905–1917.

Digital Library

[24]

Melih Yildirim. 2021. Analog circuit architecture for max and min pooling methods on image. Analog Integrated Circuits and Signal Processing 108, 1 (2021), 119–124.

Digital Library

[25]

Victor Sanh, Thomas Wolf, and Alexander Rush. 2020. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems 33 (2020), 20378–20389.

[26]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’16). IEEE, Los Alamitos, CA, 1–12.

[27]

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. ACM SIGARCH Computer Architecture News 45, 2 (2017), 548–560.

Digital Library

[28]

Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, and Yuan Xie. 2018. TETRIS: TilE-matching the TRemendous Irregular Sparsity. Advances in Neural Information Processing Systems 31 (2018), 2041.

[29]

Xiaochen Peng, Shanshi Huang, Hongwu Jiang, Anni Lu, and Shimeng Yu. 2020. DNN+ NeuroSim V2.0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 11 (2020), 2306–2319.

Digital Library

[30]

Lukas Kull, Thomas Toifl, Martin Schmatz, Pier Andrea Francese, Christian Menolfi, Matthias Braendli, Marcel Kossel, Thomas Morf, Toke Meyer Andersen, and Yusuf Leblebici. 2013. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. IEEE Journal of Solid-State Circuits 48, 12 (2013), 3049–3058.

[31]

Mehdi Saberi, Reza Lotfi, Khalil Mafinezhad, and Wouter A. Serdijn. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736–1748.

[32]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’07). IEEE, Los Alamitos, CA, 3–14.

Digital Library

[33]

Joao Carreira, Eric Noland, Chloe Hillier, and Andrew Zisserman. 2019. A short note on the Kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019).

[34]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).

[35]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.

Digital Library

[36]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[37]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.

[38]

Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.

[39]

Daniel García Moreno, Alberto A. Del Barrio, Guillermo Botella, and Jennifer Hasler. 2021. A cluster of FPAAs to recognize images using neural networks. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 11 (2021), 3391–3395.

[40]

Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Jason Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 326–331.

Digital Library

Index Terms

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Emerging technologies

Recommendations

Data and Computation Reuse in CNNs Using Memristor TCAMs
Exploiting computational and data reuse in CNNs is crucial for the successful design of resource-constrained platforms. In image recognition applications, high levels of input locality and redundancy present in CNNs have become the golden goose for ...
Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators
Special Issue on HALO for Energy-Constrained On-Chip Machine Learning

Recent works have demonstrated the promise of using resistive random access memory (ReRAM) to perform neural network computations in memory. In particular, ReRAM-based crossbar structures can perform matrix-vector multiplication directly in the analog ...
A survey of FPGA-based accelerators for convolutional neural networks
Abstract
Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 17, Issue 1

March 2024

446 pages

EISSN:1936-7414

DOI:10.1145/3613534

Editor:
Deming Chen
University of Illinois, Urbana-Champaign, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2024

Online AM: 10 October 2023

Accepted: 15 August 2023

Revised: 06 July 2023

Received: 22 February 2023

Published in TRETS Volume 17, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior–Brasil (CAPES)
Finance Code 001 and the National Council for Scientific and Technological Development (CNPq)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
515
Total Downloads

Downloads (Last 12 months)404
Downloads (Last 6 weeks)55

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents