skip to main content
10.1145/3620666.3651336acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open Access

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

Authors Info & Claims
Published:27 April 2024Publication History

ABSTRACT

Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine learning. However, due to the unstructured sparsity of SpTA, the general-purpose processors (e.g., GPU and CPU) are inefficient because of the underutilized hardware resources. Sparse kernel accelerators are optimized for specific tasks. However, their dedicated processing units and data paths cannot effectively support other SpTA tasks with different dataflow and various sparsity, resulting in performance degradation. This paper proposes FEASTA, a Flexible and Efficient Accelerator for Sparse Tensor Algebra. To process general SpTA tasks with various sparsity efficiently, we design FEASTA meticulously from three levels. At the dataflow abstraction level, we apply the Einstein Summation on the sparse fiber tree data structure to model the unified execution flow of general SpTA as joining and merging the fiber tree. At the instruction set architecture (ISA) level, a general SpTA ISA is proposed based on the execution flow. It includes different types of instructions for dense and sparse data, achieving flexibility and efficiency at the instruction level. At the architecture level, an instruction-driven architecture consisting of configurable and high-performance function units is designed, supporting the flexible and efficient ISA. Evaluations show that FEASTA has 5.40× geomean energy efficiency improvements compared to GPU among various workloads. FEASTA delivers 1.47× and 3.19× higher performance on sparse matrix multiplication kernels compared to state-of-the-art sparse matrix accelerator and CPU extension. Across diverse kernels, FEASTA achieves 1.69-12.70× energy efficiency over existing architectures.

References

  1. Sriram Aananthakrishnan, Nesreen K. Ahmed, Vincent Cave, Marcelo Cintra, Yigit Demir, Kristof Du Bois, Stijn Eyerman, Joshua B. Fryman, Ivan Ganev, Wim Heirman, Hans-Christian Hoppe, Jason Howard, Ibrahim Hur, MidhunChandra Kodiyath, Samkit Jain, Daniel S. Klowden, Marek M. Landowski, Laurent Montigny, Ankit More, Przemyslaw Ossowski, Robert Pawlowski, Nick Pepperling, Fabrizio Petrini, Mariusz Sikora, Balasubramanian Seshasayee, Shaden Smith, Sebastian Szkoda, Sanjaya Tayal, Jesmin Jahan Tithi, Yves Vandriessche, and Izajasz P. Wrosz. Piuma: programmable integrated unified memory architecture. arXiv preprint arXiv:2010.06277, 2020.Google ScholarGoogle Scholar
  2. Krister Åhlander. Einstein summation for multi-dimensional arrays. 2000.Google ScholarGoogle Scholar
  3. Maximiliana Behnke and Kenneth Heafield. Losing heads in the lottery: Pruning transformer attention in neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2664--2674, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  4. Vivek Bharadwaj, Aydın Buluç, and James Demmel. Distributed-memory sparse kernels for machine learning. 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 47--58, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  5. Haodong Bian, Gangsheng Li, Linbing Liu, Dongqiang Huang, Runting Dong, and Jianqiang Huang. Research on accelerating the performance of spmv based on avx2 instruction set. Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Siheng Chen, Baoan Liu, Chen Feng, Carlos Vallespi-Gonzalez, and Carl Wellington. 3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Processing Magazine, 38(1):68--86, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jack Choquette, Olivier Giroux, and Denis Foley. Volta: Performance and programmability. Ieee Micro, 38(2):42--52, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  8. Christopher Bongsoo Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3070--3079, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  9. William J Dally, Stephen W Keckler, and David B Kirk. Evolution of the graphics processing unit (gpu). IEEE Micro, 41(6):42--51, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, and Baoxin Li. Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights. Proceedings of the IEEE, 109:1706--1752, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805, 2019.Google ScholarGoogle Scholar
  12. Yixiao Du, Yuwei Hu, Zhongchun Zhou, and Zhiru Zhang. High-performance sparse linear algebra on hbm-equipped fpgas using hls: A case study on spmv. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Aosong Feng, Irene Li, Yuang Jiang, and Rex Ying. Diffuser: Efficient transformers with multi-hop attention diffusion for long sequences. arXiv preprint arXiv:2210.11794, 2022.Google ScholarGoogle Scholar
  14. Siying Feng, Jiawen Sun, Subhankar Pal, Xin He, Kuba Kaszyk, Dong hyeon Park, John Magnus Morton, Trevor N. Mudge, Murray Cole, Michael F. P. O'Boyle, Chaitali Chakrabarti, and Ronald G. Dreslinski. Cosparse: A software and hardware reconfigurable spmv framework for graph analytics. 2021 58th ACM/IEEE Design Automation Conference (DAC), pages 949--954, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Elias Frantar and Dan Alistarh. Sparsegpt: Massive language models can be accurately pruned in one-shot. In International Conference on Machine Learning, pages 10323--10337. PMLR, 2023.Google ScholarGoogle Scholar
  16. Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, and Trevor N. Mudge. Sparse-tpu: adapting systolic arrays for sparse matrices. Proceedings of the 34th ACM International Conference on Supercomputing, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kartik Hegde, Hadi Asghari Moghaddam, Michael Pellauer, Neal Clayton Crago, Aamer Jaleel, Edgar Solomonik, Joel S. Emer, and Christopher W. Fletcher. Extensor: An accelerator for sparse tensor algebra. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eric Hein, Tom Conte, Jeffrey Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Richard Vuduc, and Jason Riedy. An initial characterization of the emu chick. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 579--588. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  21. Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res., 22:241:1--241:124, 2021.Google ScholarGoogle Scholar
  22. Reza Hojabr, Alireza Sedaghati, Amir Sharifian, Ahmad Khonsari, and Arrvindh Shriraman. Spaghetti: Streaming accelerators for highly sparse gemm on fpgas. 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 84--96, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  23. Olivia Hsu, Maxwell Strange, Jaeyeon Won, Ritvik Sharma, Kunle Olukotun, Joel S. Emer, Mark Horowitz, and Fredrik Kjolstad. The sparse abstract machine. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, 2022.Google ScholarGoogle Scholar
  24. Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. Gespmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--12, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  25. Fredrik Kjolstad. Sparse Tensor Algebra Compilation. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge, MA, Feb 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The tensor algebra compiler. Proc. ACM Program. Lang., 1(OOPSLA):77:1--77:29, October 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 296--311, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Scott P. Kolodziej, Mohsen Mahmoudi Aznaveh, Matthew Bullock, Jarrett David, Timothy A. Davis, Matthew Henderson, Yifan Hu, and Read Sandström. The suitesparse matrix collection website interface. J. Open Source Softw., 4:1244, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  29. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 -- 90, 2012.Google ScholarGoogle Scholar
  30. Ying Li, Lingfei Ma, Zilong Zhong, Fei Liu, Michael A Chapman, Dongpu Cao, and Jonathan Li. Deep learning for lidar point clouds in autonomous driving: A review. IEEE Transactions on Neural Networks and Learning Systems, 32(8):3412--3432, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  31. Zhiyao Li, Jiaxiang Li, Taijie Chen, Dimin Niu, Hongzhong Zheng, Yuan Xie, and Mingyu Gao. Spada: Accelerating sparse matrix multiplication with adaptive dataflow. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Liu Liu, Zheng Qu, Zhaodong Chen, Fengbin Tu, Yufei Ding, and Yuan Xie. Dynamic sparse attention for scalable transformer acceleration. IEEE Transactions on Computers, 71(12):3165--3178, 2022.Google ScholarGoogle Scholar
  33. Liqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo, Peng Li, Tao Wang, and Yun Liang. Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel Tuzel, Golnoosh Samei, Mohammad Rastegari, and Mehrdad Farajtabar. Relu strikes back: Exploiting activation sparsity in large language models. arXiv preprint arXiv:2310.04564, 2023.Google ScholarGoogle Scholar
  35. Francisco Muñoz-Martínez, Raveesh Garg, Michael Pellauer, José L Abellán, Manuel E Acacio, and Tushar Krishna. Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient dnn processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, pages 252--265, 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. Cacti 6 . 0 : A tool to understand large caches. 2007.Google ScholarGoogle Scholar
  37. Vinod Nair and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning, 2010.Google ScholarGoogle Scholar
  38. Maxim Naumov, L Chien, Philippe Vandermersch, and Ujval Kapasi. Cusparse library. In GPU Technology Conference, 2010.Google ScholarGoogle Scholar
  39. Subhankar Pal, Aporva Amarnath, Siying Feng, Michael F. P. O'Boyle, Ronald G. Dreslinski, and Christophe Dubach. Sparseadapt: Runtime control for sparse linear algebra on a reconfigurable accelerator. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 724--736. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  41. Eric Qin, A. Samajdar, Hyoukjun Kwon, V Ramana Pavan Nadella, Sudarshan M. Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 58--70, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  42. Gengyu Rao, Jingji Chen, Jason Yik, and Xuehai Qian. Sparsecore: stream isa and processor specialization for sparse computation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 186--199, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, and Bo Dai. Combiner: Full attention transformer with sparse computation cost. In Neural Information Processing Systems, 2021.Google ScholarGoogle Scholar
  44. Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, and Kunle Olukotun. Capstan: A vector rda for sparsity. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pages 1022--1035, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fazle Sadi, Joseph Sweeney, Tze Meng Low, James C. Hoe, Lawrence T. Pileggi, and Franz Franchetti. Efficient spmv operation for large and highly sparse matrices using scalable multi-way merge parallelization. Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google ScholarGoogle Scholar
  47. Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young kyu Choi, Jason Lau, and Jason Cong. Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication. Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021.Google ScholarGoogle Scholar
  48. Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 766--780. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  49. Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 689--702. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  50. Vivienne Sze, Yu hsin Chen, Tien-Ju Yang, and Joel S. Emer. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 2020.Google ScholarGoogle Scholar
  51. Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in neural information processing systems, 33:6377--6389, 2020.Google ScholarGoogle Scholar
  52. Erik H Thiede, Wenda Zhou, and Risi Kondor. Graph neural networks for biochemistry that incorporate substructure. Biophysical Journal, 121(3):531a, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  53. Bangyan Wang, Lei Deng, Fei Sun, Guohao Dai, L. Liu, Yu Wang, and Yuan Xie. A one-for-all and o(v log(v))-cost solution for parallel merge style operations on sorted key-value arrays. Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Cheng Wang, Ming Cheng, Ferdous Sohel, Bennamoun, and Jonathan Li. Normalnet: A voxel-based cnn for 3d object classification and retrieval. Neurocomputing, 323:139--147, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  55. Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, and Song Han. Gcn-rl circuit designer: Transferable transistor sizing with graph neural networks and reinforcement learning. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  56. Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yujie Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, Ziyue Huang, Qipeng Guo, Haotong Zhang, Haibin Lin, Junbo Jake Zhao, Jinyang Li, Alex Smola, and Zheng Zhang. Deep graph library: Towards efficient and scalable deep learning on graphs. ArXiv, abs/1909.01315, 2019.Google ScholarGoogle Scholar
  57. Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, and Tat-Seng Chua. Learning intents behind interactions with knowledge graph for recommendation. In Proceedings of the Web Conference 2021, pages 878--887, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. Gnnadvisor: An adaptive and efficient runtime system for gnn acceleration on gpus. In USENIX Symposium on Operating Systems Design and Implementation, 2020.Google ScholarGoogle Scholar
  59. Felix Wu, Tianyi Zhang, Amauri H. de Souza, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. Simplifying graph convolutional networks. In International Conference on Machine Learning, 2019.Google ScholarGoogle Scholar
  60. Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, and Joel S Emer. Sparseloop: An analytical approach to sparse tensor accelerator modeling. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1377--1395. IEEE, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zihao Ye, Ruihang Lai, Junru Shao, Tianqi Chen, and Luis Ceze. Sparsetir: Composable abstractions for sparse compilation in deep learning. arXiv preprint arXiv:2207.04606, 2022.Google ScholarGoogle Scholar
  62. Zhe Yuan, Yongpan Liu, Jinshan Yue, Yixiong Yang, Jingyu Wang, Xiaoyu Feng, Jian Zhao, Xueqing Li, and Huazhong Yang. Sticker: An energy-efficient multi-sparsity compatible accelerator for convolutional neural networks in 65-nm cmos. IEEE Journal of Solid-State Circuits, 55:465--477, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  63. Guowei Zhang, Nithya Attaluri, Joel S Emer, and Daniel Sanchez. Gamma: Leveraging gustavson's algorithm to accelerate sparse matrix multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 687--701, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona T. Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068, 2022.Google ScholarGoogle Scholar
  65. Zhekai Zhang, Hanrui Wang, Song Han, and William J Dally. Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 261--274. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  66. Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. {SparTA}:{Deep-Learning} model sparsity via {Tensor-with-Sparsity-Attribute}. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 213--232, 2022.Google ScholarGoogle Scholar
  67. Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI open, 1:57--81, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  68. Maohua Zhu, Tao Zhang, Zhenyu Gu, and Yuan Xie. Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 359--371, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Neta Zmora, Guy Jacob, Lev Zlotnik, Bar Elharar, and Gal Novik. Neural network distiller: A python package for dnn compression research. ArXiv, abs/1910.12232, 2019.Google ScholarGoogle Scholar

Index Terms

  1. FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
          April 2024
          1106 pages
          ISBN:9798400703867
          DOI:10.1145/3620666

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 April 2024

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate535of2,713submissions,20%
        • Article Metrics

          • Downloads (Last 12 months)210
          • Downloads (Last 6 weeks)210

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader