skip to main content
10.1145/3545008.3545053acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Mentha: Enabling Sparse-Packing Computation on Systolic Arrays

Authors Info & Claims
Published:13 January 2023Publication History

ABSTRACT

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a critical kernel in domains like graph analytic and scientific computation. As a kind of classical special-purpose architecture, systolic arrays were first used for complex computing problems, e.g., matrix multiplication. However, classical systolic arrays are not efficient enough when handling sparse matrices due to the fact that the PEs containing zero-valued entries perform unnecessary operations that do not contribute to the result. Accordingly, in this paper, we propose Mentha, a framework that enables systolic arrays to accelerate sparse matrix computation by employing a sparse-packing algorithm suitable for various dataflow of systolic array. Firstly, Mentha supports both online and offline methods. By packing the rows or columns of the sparse matrix, the zero-valued items in the matrix are significantly reduced and the density of the matrix is improved. In addition, acceleration benefits can be obtained by the adaptation scheme even with limited resources. Moreover, we reconfigure PEs in systolic arrays at a low cost (1.28x in area, 1.21x in power) and find that our method outperforms TPU-like systolic arrays by 1.2~3.3x in terms of SpMM and 1.3~4.4x in terms of SpGEMM when dealing with moderately sparse matrices (sparsity < 0.9), while its performance is at least 9.7x better than cuSPARSE. Furthermore, experimental results show a FLOPs reduction of roughly 3.4x in the neural network.

References

  1. C Bagavathi and O Saraniya. 2019. Evolutionary Mapping Techniques for Systolic Computing System. Deep Learning and Parallel Computing Environment for Bioengineering Systems (2019).Google ScholarGoogle Scholar
  2. Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, De-chen Zhan, Yunxin Liu, Ming Wu, and Lintao Zhang. 2019. Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2019, Seaside, CA, USA, February 24-26, 2019, Kia Bazargan and Stephen Neuendorffer (Eds.). ACM, 63–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. NVIDIA Corporation. 2012. CUDA CUSPARSE Library.Google ScholarGoogle Scholar
  4. Baiyun Cui, Yingming Li, Ming Chen, and Zhongfei Zhang. 2019. Fine-tune BERT with Sparse Self-Attention Mechanism. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 3546–3551.Google ScholarGoogle Scholar
  5. Timothy A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1 (2011), 1:1–1:25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tim Dettmers and Luke Zettlemoyer. 2019. Sparse Networks from Scratch: Faster Training without Losing Performance. CoRR abs/1907.04840(2019).Google ScholarGoogle Scholar
  7. Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the Lottery: Making All Tickets Winners. In ICML(Proceedings of Machine Learning Research, Vol. 119). PMLR, 2943–2952.Google ScholarGoogle Scholar
  8. Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 151–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhangxiaowen Gong, Houxiang Ji, Christopher Fletcher, Christopher Hughes, Sara Baghsorkhi, and Josep Torrellas. 2020. SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs. 796–810. https://doi.org/10.1109/MICRO50266.2020.00070Google ScholarGoogle Scholar
  10. Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William (Bill) J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In FPGA. ACM, 75–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA. IEEE Computer Society, 243–254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, and Trevor N. Mudge. 2020. Sparse-TPU: adapting systolic arrays for sparse matrices. In ICS ’20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020, Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, and H. Peter Hofstee (Eds.). ACM, 19:1–19:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kartik Hegde, Hadi Asghari Moghaddam, Michael Pellauer, Neal Clayton Crago, Aamer Jaleel, Edgar Solomonik, Joel S. Emer, and Christopher W. Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In MICRO. ACM, 319–333.Google ScholarGoogle Scholar
  14. Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In PPoPP. ACM, 300–314.Google ScholarGoogle Scholar
  15. Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. T. Kung. 1982. Why Systolic Architectures?Computer 15, 1 (1982), 37–46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization. In ASPLOS. ACM, 821–834.Google ScholarGoogle Scholar
  18. Süreyya Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, and P. Sadayappan. 2020. Efficient tiled sparse matrix multiplication through matrix signatures. In SC. IEEE/ACM, 87.Google ScholarGoogle Scholar
  19. Namhoon Lee, Thalaiyasingam Ajanthan, and Philip H. S. Torr. 2019. Snip: single-Shot Network Pruning based on Connection sensitivity. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=B1VZqjAcYXGoogle ScholarGoogle Scholar
  20. Fanrong Li, Gang Li, Zitao Mo, Xiangyu He, and Jian Cheng. 2020. FSA: A Fine-Grained Systolic Accelerator for Sparse CNNs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39, 11(2020), 3589–3600.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zhi Gang Liu, Paul N. Whatmough, Yuhao Zhu, and Matthew Mattina. 2022. S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration. In IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022, Seoul, South Korea, April 2-6, 2022. IEEE, 573–586.Google ScholarGoogle ScholarCross RefCross Ref
  22. Liqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo, Peng Li, Tao Wang, and Yun Liang. 2021. Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture. In MICRO. ACM, 977–991.Google ScholarGoogle Scholar
  23. Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, and Andreas Moshovos. 2020. TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training. In MICRO. IEEE, 781–795.Google ScholarGoogle Scholar
  24. Decebal Mocanu, Elena Mocanu, Peter Stone, Phuong Nguyen, Madeleine Gibescu, and Antonio Liotta. 2018. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature Communications 9 (06 2018).Google ScholarGoogle Scholar
  25. Hesham Mostafa and Xin Wang. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 4646–4655.Google ScholarGoogle Scholar
  26. Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David T. Blaauw, Trevor N. Mudge, and Ronald G. Dreslinski. 2018. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018. IEEE Computer Society, 724–736.Google ScholarGoogle Scholar
  27. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In ISCA. ACM, 27–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In HPCA.Google ScholarGoogle Scholar
  29. Mohammadreza Soltaniyeh, Richard Martin, and Santosh Nagarakatte. 2022. An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-Matrix Multiplication. ACM Transactions on Architecture and Code Optimization (04 2022).Google ScholarGoogle Scholar
  30. Nitish Kumar Srivastava, Hanchen Jin, Jie Liu, David H. Albonesi, and Zhiru Zhang. 2020. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. IEEE, 766–780.Google ScholarGoogle Scholar
  31. Vinay Vashishtha, Manoj Vangala, and Lawrence T. Clark. 2017. ASAP7 predictive design kit development and cell design technology co-optimization: Invited paper. In ICCAD. IEEE, 992–998.Google ScholarGoogle Scholar
  32. Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2020. Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. IEEE, 711–724.Google ScholarGoogle ScholarCross RefCross Ref
  33. Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In HPCA. IEEE, 261–274.Google ScholarGoogle Scholar
  34. Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, and Xu Sun. 2019. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection. CoRR abs/1912.11637(2019).Google ScholarGoogle Scholar
  35. Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach. In MICRO. IEEE Computer Society, 15–28.Google ScholarGoogle Scholar
  36. Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. 2019. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs. In MICRO. ACM, 359–371.Google ScholarGoogle Scholar

Index Terms

  1. Mentha: Enabling Sparse-Packing Computation on Systolic Arrays

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
        August 2022
        976 pages
        ISBN:9781450397339
        DOI:10.1145/3545008

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 January 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate91of313submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)101
        • Downloads (Last 6 weeks)18

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format