skip to main content
10.1145/3566097.3567859acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Toward Energy-Efficient Sparse Matrix-Vector Multiplication with near STT-MRAM Computing Architecture

Published:31 January 2023Publication History

ABSTRACT

Sparse Matrix-Vector Multiplication (SpMV) is one of the vital computational primitives used in modern workloads. SpMV performs memory access, leading to unnecessary data transmission, massive data access, and redundant multiplicative accumulators. Therefore, we propose the near spin-transfer torque magnetic random access memory (STT-MRAM) processing architecture from three optimization perspectives. These optimizations include (1) the NMP controller receives the instruction through the AXI4 bus to implement the SpMV operation in the following steps, identifies valid data, and encodes the index depending on the kernel size, (2) the NMP controller uses high-level synthesis dataflow in the shared buffer for achieving better performance throughput while do not consume bus bandwidth, and (3) the configurable MACs are implemented in the NMP core without matching step entirely during the multiplication. Using these optimizations, the NMP architecture can access the pipelined STT-MRAM (read bandwidth is 26.7GB/s). The experimental simulation results show that this design achieves up to 66x and 28x speedup compared with state-of-the-art ones and 69x speedup without sparse optimization.

References

  1. Xilinx 2020. Xilinx Power Estimator. Retrieved June 8, 2021, from https://www.xilinx.com/products/technology/power/xpe.html.Google ScholarGoogle Scholar
  2. Hao Cai, Juntong Chen, Yongliang Zhou, and Weisheng. Zhao. 2021. Toward Energy-Efficient STT-MRAM Design With Multi-Modes Reconfiguration. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 7, 2633--2639. Google ScholarGoogle ScholarCross RefCross Ref
  3. Wenlong Cai, Mengxing Wang, Kaihua Cao, Huaiwen Yang, Shouzhong Peng, Huisong Li, and Weisheng. Zhao. 2021. Stateful implication logic based on perpendicular magnetic tunnel junctions. Science China Information Sciences 65, 2, 1869--1919. Google ScholarGoogle ScholarCross RefCross Ref
  4. Yufei Chen, Haojie Pei, Xiao Dong, Zhou Jin, and Cheng Zhuo. 2022. Application of Deep Learning in Back-End Simulation: Challenges and Opportunities. 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 641--646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guohao Dai, Zhenhua Zhu, Tianyu Fu, Chiyue Wei, Bangyan Wang, Xiangyu Li, Yuan Xie, Huazhong Yang, and Yu Wang. 2022. DIMMining: Pruning-Efficient and Parallel Graph Mining on near-Memory-Computing. Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA), 130--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Dieny, I. L. Prejbeanu, K. Garello, P. Gambardella, P. Freitas, and et al. 2020. Opportunities and challenges for spintronics in the microelectronics industry. Nature Electronics. 3, 8, 446--459. Google ScholarGoogle ScholarCross RefCross Ref
  8. Yixiao Du, Yuwei Hu, Zhongchun Zhou, and Zhiru Zhang. 2022. HighPerformance Sparse Linear Algebra on HBM-Equipped FPGAs Using HLS: A Case Study on SpMV. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 54--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rafael Garibotti, Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. 2018. Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis. IEEE Transactions on Circuits and Systems II: Express Briefs. 65, 10, 1440--1444. Google ScholarGoogle ScholarCross RefCross Ref
  10. Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. 2019. Demystifying Complex Workload-DRAM Interactions: An Experimental Study. Proc. ACM Meas. Anal. Comput. Syst. 3, 3, 65--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zongxia Guo, Jialiang Yin, Yue Bai, Daoqian Zhu, Kewen Shi, Gefei Wang, Kaihua Cao, and Weisheng Zhao. 2021. Spintronics for Energy- Efficient Computing: An Overview and Outlook. Proc. IEEE 109, 8, 1398--1417. Google ScholarGoogle ScholarCross RefCross Ref
  12. Yuwei Hu, Yixiao Du, Ecenur Ustun, and Zhiru Zhang. 2021. GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Shihua Huang, Luc Waeijen, and Henk Corporaal. 2022. How Flexible is Your Computing System? ACM Trans. Embed. Comput. Syst. Just Accepted, 1--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Veronia Iskandar, Mohamed A. Abd El Ghany, and Diana Goehringer. 2022. Near-Memory Computing on FPGAs with 3D-Stacked Memories: Applications, Architectures, and Optimizations. ACM Trans. Reconfigurable Technol. Syst. Just Accepted, 6, 383--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hongyang Jia, Murat Ozatay, Yinqi Tang, Hossein Valavi, Rakshit Pathak, Jinseok Lee, and Naveen Verma. 2022. A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing. A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing. 64, 130--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hao Jiang, Kevin Yamada, Zizhe Ren, Thomas Kwok, Fu Luo, Qing Yang, Xiaorong Zhang, J. Joshua Yang, Qiangfei Xia, Yiran Chen, Hai Li, Qing Wu, and Mark. Barnell. 2018. Pulse-Width Modulation based Dot-Product Engine for Neuromorphic Computing System using Memristor Crossbar Array. 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  17. Inyup Kang. 2022. The Art of Scaling: Distributed and Connected to Sustain the Golden Age of Computation. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 65, 25--31. Google ScholarGoogle ScholarCross RefCross Ref
  18. Kyoung-Rog Lee, Jihoon Kim, Changhyeon Kim, Donghyeon Han, Juhyoung Lee, Jinsu Lee, Hongsik Jeong, and Hoi-Jun. Yoo. 2020. A 1.02-μW STT-MRAM-Based DNN ECG Arrhythmia Monitoring SoC With Leakage-Based Delay MAC Unit. IEEE Solid-State Circuits Letters 3, 390--393. Google ScholarGoogle ScholarCross RefCross Ref
  19. Bing Li, Bonan Yan, and Hai Li. 2019. An Overview of In-Memory Processing with Emerging Non-Volatile Memory for Data-Intensive Applications. IEEE Transactions on Circuits and Systems I: Regular Papers, 381--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yueting Li, Wang Kang, Kunyu Zhou, Keni Qiu, and Weisheng. Zhao. 2022. Experimental Demonstration of STT-MRAM Based Nonvolatile Instantly On/Off System for IoT Applications: Case Studies. ACM Trans. Embed. Comput. Syst. Just Accepted, 1--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alberto Parravicini, Luca Giuseppe Cellamare, Marco Siracusa, and Marco D. Santambrogio. 2021. Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs. 2021 58th ACM/IEEE Design Automation Conference (DAC), 799--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kangyi Qiu, Yaojun Zhang, Bonan Yan, and Ru Huang. 2022. Heterogeneous Memory Architecture Accommodating Processing-in-Memory on SoC for AIoT Applications. 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 383--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Davide Rossi, Francesco Conti, Manuel Eggiman, Alfio Di Mauro, Giuseppe Tagliavini, Stefan Mach, Marco Guermandi, Antonio Pullini, Igor Loi, Jie Chen, Eric Flamand, and Luca Benini. 2022. Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode. IEEE Journal of Solid-State Circuits 57, 1, 127--139. Google ScholarGoogle ScholarCross RefCross Ref
  24. Björn Sigurbergsson, Tom Hogervorst, Tong Dong Qiu, and Razvan Nane. 2019. Sparstition: A Partitioning Scheme for Large-Scale Sparse Matrix-Vector Multiplication on FPGA. 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 2160-052X, 51--58. Google ScholarGoogle ScholarCross RefCross Ref
  25. Soumendu Sinha, Nishad Sahu, Rishabh Bhardwaj, Aditya Mehta, Hitesh Ahuja, Satyam Srivastava, Anubhav Elhence, and Vinay. Chamola. 2021. Machine Learning on FPGA for Robust Si3N4-Gate ISFET pH Sensor in Industrial IoT Applications. IEEE Transactions on Industry Applications 57, 6, 1--41. Google ScholarGoogle ScholarCross RefCross Ref
  26. Baohua Sun, Daniel Liu, Leo Yu, Jay Li, Helen Liu, Wenhan Zhang, and Terry. Torng. 2018. MRAM co-designed processing-in-memory CNN accelerator for mobile and IoT applications. arXiv preprint arXiv:1811.12179. Google ScholarGoogle ScholarCross RefCross Ref
  27. Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, and Xin Fu. 2021. Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 885--897. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xueyan Wang, Jianlei Yang, Yinglin Zhao, Xiaotao Jia, Rong Yin, Xuhang Chen, Gang Qu, and Weisheng. Zhao. 2021. Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture. IEEE Trans. Comput., 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. He Zhang, Junzhan Liu, Jinyu Bai, Sai Li, Lichuan Luo, Shaoqian Wei, Jianxin Wu, Wang Kang, and Weisheng Zhao. 2022. HD-CIM: Hybrid-Device ComputingIn-Memory Structure Based on MRAM and SRAM to Reduce Weight Loading Energy of Neural Networks. IEEE Transactions on Circuits and Systems-I: Regular Paper 69, 11, 4465--4474. Google ScholarGoogle ScholarCross RefCross Ref
  30. Yue Zhang, Jinkai Wang, Chenyu Lian, Yining Bai, Guanda Wang, Zhizhong Zhang, Zhenyi Zheng, Lei Chen, Kun Zhang, Georgios Sirakoulis, and Youguang Zhang. 2021. Time-Domain Computing in Memory Using Spintronics for Energy-Efficient Convolutional Neural Network. IEEE Transactions on Circuits and SystemsI: Regular Papers 68, 3, 1193--1205. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Toward Energy-Efficient Sparse Matrix-Vector Multiplication with near STT-MRAM Computing Architecture
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference
        January 2023
        807 pages
        ISBN:9781450397834
        DOI:10.1145/3566097

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 January 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPDAC '23 Paper Acceptance Rate102of328submissions,31%Overall Acceptance Rate466of1,454submissions,32%

        Upcoming Conference

        ASPDAC '25
      • Article Metrics

        • Downloads (Last 12 months)128
        • Downloads (Last 6 weeks)15

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader