Skip to main content

SparG: A Sparse GEMM Accelerator for Deep Learning Applications

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Abstract

Deep learning has become a hot field of research. Previously, the deep learning algorithms were mainly run by the CPU and GPU. With the rapid development of deep learning, it has been found that the previous processors can no longer carry the specific large-scale calculations of deep learning, and customized accelerators of deep learning have become popular. The main workload of most deep learning is the General Matrix-matrix Multiplication (GEMM), and emerging GEMM are highly sparse and irregular. The TPU and SIGMA are state-of-the-art GEMM accelerators in recent years, but the TPU does not support sparsity, and the SIGMA has insufficient utilization in some Processing Elements (PEs). In this paper, we design and implement the SparG, a flexible sparse GEMM accelerator. The SparG has a specific PE structure, a flexible distribution network, and an efficient reduction network. For sparse and irregular GEMMs, the SparG can maintain high utilization of PEs while taking advantage of sparsity. We run sparse and irregular GEMMs in the TPU, SIGMA, and SparG. The experimental results show that the performance of the SparG is the highest (30x better than the TPU, and 3.6x better than the SIGMA), and the SparG brings only a small amount of additional hardware overhead (~20% more than the TPU, and ~10% more than the SIGMA).

B. Wang, S. Ma and Y. Yuan—Contributed equally to this research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Nguyen, G., et al.: Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artif. Intell. Rev. 52(1), 77–124 (2019). https://doi.org/10.1007/s10462-018-09679-z

    Article  Google Scholar 

  2. Yang, S., Wang, Y., Chu, X.: A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020)

  3. Acun, B., Murphy, M., Wang, X., et al.: Understanding training efficiency of deep learning recommendation models at scale. In: HPCA2021, pp. 802–814. IEEE (2021)

    Google Scholar 

  4. Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2020)

    Article  MathSciNet  Google Scholar 

  5. AI and Compute, https://openai.com/blog/ai-and-compute/, last accessed 2022/04/01

  6. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12. (2017)

    Google Scholar 

  7. Qin, E., Samajdar, A., Kwon, H., et al.: Sigma: a sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: HPCA2020, pp. 28–70. IEEE (2020)

    Google Scholar 

  8. Gu, J., Wang, Z., Kuen, J., et al.: Recent advances in convolutional neural networks. Pattern Recogn. 77, 354–377 (2018)

    Article  Google Scholar 

  9. Krizhevsky, A., Sutskever, I., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  10. Li, J., Jiang, S., Gong, S., Wu, J., et al.: Squeezeflow: a sparse CNN accelerator exploiting concise convolution rules. IEEE Trans. Comput. 68(11), 1663–1677 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cao, S., Ma, L., Xiao, W., Zhang, C., et al.: Seernet: predicting convolutional neural network feature-map sparsity through low-bit quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11216–11225 (2019)

    Google Scholar 

  12. Srivastava, N., Hinton, G., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. Adv. Neural. Inf. Process. Syst. 28, 1135–1143 (2015)

    Google Scholar 

  14. Albericio, J., Judd, P., Hetherington, T., et al.: Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Arch. News 44(3), 1–13 (2016)

    Article  Google Scholar 

  15. Gupta, U., Reagen, B., Pentecost, L., Donato, M., et al.: Masr: a modular accelerator for sparse rnns. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 1–14. IEEE (2019)

    Google Scholar 

  16. Wang, H., Zhang, Z., Han, S.: Spatten: efficient sparse attention architecture with cascade token and head pruning. In: HPCA2021, pp. 97–110. IEEE (2021)

    Google Scholar 

  17. Yazdanbakhsh, A., Samadi, K., Kim, N.S., et al.: Ganax: a unified mimd-simd acceleration for generative adversarial networks. In: ISCA2018, pp. 650–661. IEEE (2018)

    Google Scholar 

  18. Horowitz, M.: 1.1 computing's energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. IEEE (2014)

    Google Scholar 

  19. Chakrabarty, A., Collier, M., Mukhopadhyay, S.: Matrix-based nonblocking routing algorithm for Beneš networks. In: 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, pp. 551–556. IEEE (2009)

    Google Scholar 

  20. Kwon, H., Samajdar, A., et al.: Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53(2), 461–475 (2018)

    Article  Google Scholar 

  21. Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)

    Google Scholar 

  22. Zhang, S., Du, Z., Zhang, L., et al.: Cambricon-X: an accelerator for sparse neural networks. In: MICRO2016, pp. 1–12. IEEE (2016)

    Google Scholar 

  23. Parashar, A., Rhu, M., et al.: SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45(2), 27–40 (2017)

    Article  Google Scholar 

  24. Gondimalla, A., Chesnut, N., Thottethodi, M., et al.: Sparten: a sparse tensor accelerator for convolutional neural networks. In: MICRO2019, pp. 151–165 (2019)

    Google Scholar 

  25. Han, S., Liu, X., Mao, H., et al.: EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Arch. News 44(3), 243–254 (2016)

    Article  Google Scholar 

  26. Hegde, K., Asghari-Moghaddam, H., Pellauer, M., et al.: Extensor: an accelerator for sparse tensor algebra. In: MICRO2019, pp. 319–333 (2019)

    Google Scholar 

  27. Chen, Y.H., Yang, T.J., Emer, J., et al.: Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)

    Article  Google Scholar 

  28. Lu, W., Yan, G., Li, J., et al.: Flexflow: a flexible dataflow accelerator architecture for convolutional neural network. In: HPCA2017, pp. 553–564. IEEE (2017)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Key R&D Project No.2021YFB0300300, the NSFC (62172430, 61872374, 62272476), the NSF of Hunan Province (2021JJ10052, 2022JJ10064).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bo Wang or Sheng Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, B. et al. (2023). SparG: A Sparse GEMM Accelerator for Deep Learning Applications. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22677-9_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22676-2

  • Online ISBN: 978-3-031-22677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics