skip to main content
research-article

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

Published: 27 January 2024 Publication History

Abstract

As the massive usage of artificial intelligence techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM devices to compute matrix-vector multiplication in O(1) time complexity is a promising approach, but it is true that these implementations often fail to cover the diversity of non-linearities required for modern NN applications. In this work, we propose a novel approach where Resistive RAMs themselves can be reprogrammed to compute not only the required matrix multiplications but also the activation functions, Softmax, and pooling layers, reducing energy in complex NNs. This approach offers more versatility for researching novel NN layouts compared to custom logic. Results show that our device outperforms analog and digital field-programmable approaches by up to 8.5× in experiments on real-world human activity recognition and language modeling datasets with convolutional neural network, generative pre-trained Transformer, and long short-term memory models.

References

[1]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[2]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
[3]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[4]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.
[5]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[7]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.
[8]
Mohammed A. Zidan, John Paul Strachan, and Wei D. Lu. 2018. The future of electronics based on memristive systems. Nature Electronics 1, 1 (2018), 22–29.
[9]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic, Wen-Mei Hwu, and Kaushik Roy. 2020. Panther: A programmable architecture for neural network training harnessing energy-efficient reram. IEEE Transactions on Computers 69, 8 (2020), 1128–1142.
[10]
Tayfun Gokmen and Yurii Vlasov. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.
[11]
Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. 2019. FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 733–747.
[12]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.
[13]
Tao Luo, Shaoli Liu, Ling Li, Yuqing Wang, Shijin Zhang, Tianshi Chen, Zhiwei Xu, Olivier Temam, and Yunji Chen. 2016. DaDianNao: A neural network supercomputer. IEEE Transactions on Computers 66, 1 (2016), 73–88.
[14]
Yun Long, Taesik Na, and Saibal Mukhopadhyay. 2018. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2781–2794.
[15]
Jianhui Han, He Liu, Mingyu Wang, Zhaolin Li, and Youhui Zhang. 2019. ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory. IEEE Transactions on Parallel and Distributed Systems 31, 6 (2019), 1328–1342.
[16]
Ligang Gao, I.-Ting Wang, Pai-Yu Chen, Sarma Vrudhula, Jae-Sun Seo, Yu Cao, Tuo-Hung Hou, and Shimeng Yu. 2015. Fully parallel write/read in resistive synaptic array for accelerating on-chip learning. Nanotechnology 26, 45 (2015), 455204.
[17]
Alessandro Grossi, Elisa Vianello, Cristian Zambelli, Pablo Royer, Jean-Philippe Noel, Bastien Giraud, Luca Perniola, Piero Olivo, and Etienne Nowak. 2018. Experimental investigation of 4-kb RRAM arrays programming conditions suitable for TCAM. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2599–2607.
[18]
Kazybek Adam, Kamilya Smagulova, and Alex James. 2021. Generalised analog LSTMs recurrent modules for neural computing. Frontiers in Computational Neuroscience 15 (2021), 85.
[19]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 75–84.
[20]
Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 11–20.
[21]
Jennifer Hasler. 2022. The potential of SoC FPAAs for emerging ultra-low-power machine learning. Journal of Low Power Electronics and Applications 12, 2 (2022), 33.
[22]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Computer Architecture News 44, 3 (2016), 27–39.
[23]
Boxun Li, Peng Gu, Yi Shan, Yu Wang, Yiran Chen, and Huazhong Yang. 2015. RRAM-based analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 1905–1917.
[24]
Melih Yildirim. 2021. Analog circuit architecture for max and min pooling methods on image. Analog Integrated Circuits and Signal Processing 108, 1 (2021), 119–124.
[25]
Victor Sanh, Thomas Wolf, and Alexander Rush. 2020. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems 33 (2020), 20378–20389.
[26]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’16). IEEE, Los Alamitos, CA, 1–12.
[27]
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. ACM SIGARCH Computer Architecture News 45, 2 (2017), 548–560.
[28]
Yu Ji, Ling Liang, Lei Deng, Youyang Zhang, Youhui Zhang, and Yuan Xie. 2018. TETRIS: TilE-matching the TRemendous Irregular Sparsity. Advances in Neural Information Processing Systems 31 (2018), 2041.
[29]
Xiaochen Peng, Shanshi Huang, Hongwu Jiang, Anni Lu, and Shimeng Yu. 2020. DNN+ NeuroSim V2.0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 11 (2020), 2306–2319.
[30]
Lukas Kull, Thomas Toifl, Martin Schmatz, Pier Andrea Francese, Christian Menolfi, Matthias Braendli, Marcel Kossel, Thomas Morf, Toke Meyer Andersen, and Yusuf Leblebici. 2013. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. IEEE Journal of Solid-State Circuits 48, 12 (2013), 3049–3058.
[31]
Mehdi Saberi, Reza Lotfi, Khalil Mafinezhad, and Wouter A. Serdijn. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736–1748.
[32]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norm Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’07). IEEE, Los Alamitos, CA, 3–14.
[33]
Joao Carreira, Eric Noland, Chloe Hillier, and Andrew Zisserman. 2019. A short note on the Kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019).
[34]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).
[35]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.
[36]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[37]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.
[38]
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.
[39]
Daniel García Moreno, Alberto A. Del Barrio, Guillermo Botella, and Jennifer Hasler. 2021. A cluster of FPAAs to recognize images using neural networks. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 11 (2021), 3391–3395.
[40]
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Jason Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 326–331.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 1
March 2024
446 pages
EISSN:1936-7414
DOI:10.1145/3613534
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2024
Online AM: 10 October 2023
Accepted: 15 August 2023
Revised: 06 July 2023
Received: 22 February 2023
Published in TRETS Volume 17, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Neural networks
  2. Resistive RAM (ReRAM)
  3. in-memory processing
  4. reconfigurable computing

Qualifiers

  • Research-article

Funding Sources

  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior–Brasil (CAPES)
  • Finance Code 001 and the National Council for Scientific and Technological Development (CNPq)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 515
    Total Downloads
  • Downloads (Last 12 months)404
  • Downloads (Last 6 weeks)55
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media