skip to main content
research-article

Static Scheduling of Weight Programming for DNN Acceleration with Resource Constrained PIM

Published: 11 September 2024 Publication History

Abstract

Most existing architectural studies on ReRAM-based processing-in-memory (PIM) DNN accelerators assume that all weights of the DNN can be mapped to the crossbar at once. However, these studies are over-idealized. ReRAM crossbar resources for calculation are limited because of technological limitations, so multiple weight mapping procedures are required during the inference process. In this article, we propose a static scheduling framework which generates the mapping between DNN weights and ReRAM cells with minimum runtime weight programming cost. We first build a ReRAM crossbar programming latency model by simultaneously considering the DNN weight patterns, ReRAM programming operations, and PIM architecture characteristics. Then, the model is used in the searching process to obtain an optimized weight-to-OU mapping table with minimum online programming latency. Finally, an OU scheduler is used to coordinate the activation sequences of OUs in the crossbars to perform the inference computation correctly. Evaluation results show the proposed framework significantly reduces the weight programming overhead and the overall inference latency for various DNN models with different input datasets.

References

[1]
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 715–731.
[2]
Wei-Hao Chen, Kai-Xiang Li, Wei-Yu Lin, Kuo-Hsiang Hsu, Pin-Yi Li, Cheng-Han Yang, Cheng-Xin Xue, En-Yu Yang, Yen-Kai Chen, Yun-Sheng Chang, et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In ISSCC’18.
[3]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ISCA’16 (2016).
[4]
Chaoqun Chu, Yanzhi Wang, Yilong Zhao, Xiaolong Ma, Shaokai Ye, Yunyan Hong, Xiaoyao Liang, Yinhe Han, and Li Jiang. 2020. PIM-Prune: Fine-grain DCNN pruning for crossbar-based process-in-memory architecture. In DAC’20.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Saransh Gupta, Mohsen Imani, Behnam Khaleghi, Venkatesh Kumar, and Tajana Rosing. 2019. RAPID: A ReRAM processing in-memory architecture for DNA sequence alignment. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1–6.
[7]
Chung-Wei Hsu, I-Ting Wang, Chun-Li Lo, Ming-Chung Chiang, Wen-Yueh Jang, Chen-Hsi Lin, and Tuo-Hung Hou. 2013. Self-rectifying bipolar TaO x/TiO 2 RRAM with superior endurance over 10-12 cycles for 3D high-density storage-class memory. In 2013 Symposium on VLSI Technology. IEEE, T166–T167.
[8]
Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. 2019. FPSA: A full system stack solution for reconfigurable reram-based nn accelerator architecture. In 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 733–747.
[9]
Hai Jin, Cong Liu, Haikun Liu, Ruikun Luo, Jiahong Xu, Fubing Mao, and Xiaofei Liao. 2021. ReHy: A ReRAM-based digital/analog hybrid PIM architecture for accelerating CNN training. IEEE Transactions on Parallel and Distributed Systems (2021).
[10]
Wantong Li, Xiaoyu Sun, Shanshi Huang, Hongwu Jiang, and Shimeng Yu. 2022. A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references. IEEE Journal of Solid-State Circuits (2022).
[11]
Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, and Yingyan Lin. 2020. Timely: Pushing data movements and interfaces in pim accelerators towards local and in time domain. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 832–845.
[12]
Meng-Yao Lin, Hsiang-Yun Cheng, Wei-Ting Lin, Tzu-Hsien Yang, I-Ching Tseng, Chia-Lin Yang, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, and Meng-Fan Chang. 2018. DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8.
[13]
Fangxin Liu, Wenbo Zhao, Zhezhi He, Zongwu Wang, Yilong Zhao, Yongbiao Chen, and Li Jiang. 2021. Bit-transformer: Transforming bit-level sparsity into higher preformance in ReRam-based accelerator. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1–9.
[14]
Qi Liu, Bin Gao, Peng Yao, Dong Wu, Junren Chen, Yachuan Pang, Wenqiang Zhang, Yan Liao, Cheng-Xin Xue, Wei-Hao Chen, et al. 2020. A fully integrated analog ReRAM based 78.4 TOPS/W compute-in-memory chip with fully parallel MAC computing. In ISSCC’20.
[15]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. ROBERTA: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[16]
Anni Lu, Xiaochen Peng, Yandong Luo, and et al.2021. A runtime reconfigurable design of compute-in-memory-based hardware accelerator for deep learning inference. ACM Trans. Design Autom. Electr. Syst. (2021).
[17]
Anni Lu, Xiaochen Peng, Yandong Luo, and Shimeng Yu. 2020. Benchmark of the compute-in-memory-based DNN accelerator with area constraint. IEEE Trans. on Very Large Scale Integration (VLSI) Systems 28, 9 (2020), 1945–1952.
[18]
Anni Lu, Xiaochen Peng, and Shimeng Yu. 2021. Compute-in-RRAM with limited on-chip resources. In AICAS’21.
[19]
Manqing Mao, Xiaochen Peng, Rui Liu, and et al.2019. MAX 2: An ReRAM-based neural network accelerator that maximizes data reuse and area utilization. J. Emerg. Sel. Topics Circuits Syst. (2019).
[20]
Mehdi Saberi, Reza Lotfi, Khalil Mafinezhad, and Wouter A. Serdijn. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736–1748.
[21]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ISCA’16.
[22]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR’15.
[23]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In HPCA’17.
[24]
Zhuoran Song, Dongyue Li, Zhezhi He, Xiaoyao Liang, and Li Jiang. 2021. ReRAM-sharing: Fine-grained weight sharing for ReRAM-based deep neural network accelerator. In ISCAS’21.
[25]
Chen-Yang Tsai, Chin-Fu Nien, Tz-Ching Yu, Hung-Yu Yeh, and Hsiang-Yun Cheng. 2021. RePIM: Joint exploitation of activation and weight repetitions for in-ReRAM DNN acceleration. In DAC’21.
[26]
Yen-Ting Tsou, Kuan-Hsun Chent, Chia-Lin Yang, Hsiang-Yun Cheng, Jian-Jia Chen, and Der-Yu Tsai. 2022. This is SPATEM! A spatial-temporal optimization framework for efficient inference on ReRAM-based CNN accelerator. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 702–707.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[28]
Chengning Wang, Dan Feng, Jingning Liu, Wei Tong, Bing Wu, and Yang Zhang. 2017. DAWS: Exploiting crossbar characteristics for improving write performance of high density resistive memory. In ICCD’17.
[29]
Chengning Wang, Dan Feng, Wei Tong, Jingning Liu, Bing Wu, Wei Zhao, Yang Zhang, and Yiran Chen. 2020. Improving write performance on cross-point RRAM arrays by leveraging multidimensional non-uniformity of cell effective voltage. TC’20 (2020).
[30]
Wen Wen, Youtao Zhang, and Jun Yang. 2018. Wear leveling for crossbar resistive memory. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, 1–6.
[31]
Wen Wen, Youtao Zhang, and Jun Yang. 2019. ReNEW: Enhancing lifetime for ReRAM crossbar based neural network accelerators. In 2019 IEEE 37th International Conference on Computer Design (ICCD). IEEE, 487–496.
[32]
Wen Wen, Youtao Zhang, and Jun Yang. 2020. Accelerating 3D vertical resistive memories with opportunistic write latency reduction. In 2020 IEEE/ACM International Conference Oon Computer Aided Design (ICCAD). IEEE, 1–8.
[33]
Lixue Xia, Boxun Li, Tianqi Tang, and et al.2017. MNSIM: Simulation platform for memristor-based neuromorphic computing system. TCAD 37, 5 (2017), 1009–1022.
[34]
Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, and Yuan Xie. 2015. Overcoming the challenges of crossbar resistive memory architectures. In HPCA’15.
[35]
Cheng-Xin Xue, Je-Min Hung, Hui-Yao Kao, and et al.2021. A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7 TOPS/W for tiny AI edge devices. In ISSCC’21.
[36]
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, and et al.2019. Sparse ReRAM engine: Joint exploration of activation and weight sparsity in compressed neural networks. In ISCA’19.
[37]
Jong-Hyeok Yoon, Muya Chang, Win-San Khwa, Yu-Der Chih, Meng-Fan Chang, and Arijit Raychowdhury. 2021. 29.1 A 40nm 64Kb 56.67 TOPS/W read-disturb-tolerant compute-in-memory/digital RRAM macro with active-feedback-based read and in-situ write verification. In ISSCC’21.
[38]
Geng Yuan, Payman Behnam, Zhengang Li, and et al.2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. In ISCA’21.
[39]
Hang Zhang, Nong Xiao, Fang Liu, and Zhiguang Chen. 2016. LEADER: Accelerating ReRAM-based main memory by leveraging access latency discrepancy in crossbar arrays. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 756–761.
[40]
Yang Zhang, Dan Feng, Jingning Liu, Wei Tong, Bing Wu, and Caihua Fang. 2017. A novel ReRAM-based main memory structure for optimizing access latency and reliability. In 54th Annual Design Automation Conference 2017. 1–6.
[41]
Yang Zhang, Dan Feng, Wei Tong, Yu Hua, Jingning Liu, Zhipeng Tan, Chengning Wang, Bing Wu, Zheng Li, and Gaoxiang Xu. 2018. CACF: A novel circuit architecture co-optimization framework for improving performance, reliability and energy of ReRAM-based main memory system. ACM Transactions on Architecture and Code Optimization (TACO) 15, 2 (2018), 1–26.
[42]
Yuhao Zhang, Zhiping Jia, Yungang Pan, Hongchao Du, Zhaoyan Shen, Mengying Zhao, and Zili Shao. 2020. PATTPIM: A practical ReRAM-based DNN accelerator by reusing weight pattern repetitions. In DAC’20.
[43]
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129 (2019).
[44]
Yang-Lin Zheng, Wei-Yi Yang, Ya-Shu Chen, and Ding-Hung Han. 2022. An energy-efficient inference engine for a configurable ReRAM-based neural network accelerator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022).
[45]
Farzaneh Zokaee and Lei Jiang. 2020. Mitigating voltage drop in resistive memories by dynamic reset voltage regulation and partition reset. In HPCA’20.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 23, Issue 6
November 2024
505 pages
EISSN:1558-3465
DOI:10.1145/3613645
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 11 September 2024
Online AM: 11 August 2023
Accepted: 22 July 2023
Revised: 21 May 2023
Received: 26 January 2023
Published in TECS Volume 23, Issue 6

Check for updates

Author Tags

  1. PIM
  2. ReRAM
  3. resource-constrained
  4. DNN accelerator

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation of China
  • Taishan Scholars Program, and Qilu Young Scholar Program of Shandong University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 475
    Total Downloads
  • Downloads (Last 12 months)291
  • Downloads (Last 6 weeks)42
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media