SparG: A Sparse GEMM Accelerator for Deep Learning Applications

Wang, Bo; Ma, Sheng; Yuan, Yuan; Dai, Yi; Jiang, Wei; Hou, Xiang; Yi, Xiao; Xu, Rui

doi:10.1007/978-3-031-22677-9_28

SparG: A Sparse GEMM Accelerator for Deep Learning Applications

Bo Wang¹¹,
Sheng Ma¹¹,
Yuan Yuan¹¹,
Yi Dai¹¹,
Wei Jiang¹¹,
Xiang Hou¹¹,
Xiao Yi¹¹ &
…
Rui Xu¹¹

Conference paper
First Online: 11 January 2023

1480 Accesses
1 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Abstract

Deep learning has become a hot field of research. Previously, the deep learning algorithms were mainly run by the CPU and GPU. With the rapid development of deep learning, it has been found that the previous processors can no longer carry the specific large-scale calculations of deep learning, and customized accelerators of deep learning have become popular. The main workload of most deep learning is the General Matrix-matrix Multiplication (GEMM), and emerging GEMM are highly sparse and irregular. The TPU and SIGMA are state-of-the-art GEMM accelerators in recent years, but the TPU does not support sparsity, and the SIGMA has insufficient utilization in some Processing Elements (PEs). In this paper, we design and implement the SparG, a flexible sparse GEMM accelerator. The SparG has a specific PE structure, a flexible distribution network, and an efficient reduction network. For sparse and irregular GEMMs, the SparG can maintain high utilization of PEs while taking advantage of sparsity. We run sparse and irregular GEMMs in the TPU, SIGMA, and SparG. The experimental results show that the performance of the SparG is the highest (30x better than the TPU, and 3.6x better than the SIGMA), and the SparG brings only a small amount of additional hardware overhead (~20% more than the TPU, and ~10% more than the SIGMA).

B. Wang, S. Ma and Y. Yuan—Contributed equally to this research.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Nguyen, G., et al.: Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artif. Intell. Rev. 52(1), 77–124 (2019). https://doi.org/10.1007/s10462-018-09679-z
Article Google Scholar
Yang, S., Wang, Y., Chu, X.: A survey of deep learning techniques for neural machine translation. arXiv preprint arXiv:2002.07526 (2020)
Acun, B., Murphy, M., Wang, X., et al.: Understanding training efficiency of deep learning recommendation models at scale. In: HPCA2021, pp. 802–814. IEEE (2021)
Google Scholar
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2020)
Article MathSciNet Google Scholar
AI and Compute, https://openai.com/blog/ai-and-compute/, last accessed 2022/04/01
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12. (2017)
Google Scholar
Qin, E., Samajdar, A., Kwon, H., et al.: Sigma: a sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: HPCA2020, pp. 28–70. IEEE (2020)
Google Scholar
Gu, J., Wang, Z., Kuen, J., et al.: Recent advances in convolutional neural networks. Pattern Recogn. 77, 354–377 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., et al.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Li, J., Jiang, S., Gong, S., Wu, J., et al.: Squeezeflow: a sparse CNN accelerator exploiting concise convolution rules. IEEE Trans. Comput. 68(11), 1663–1677 (2019)
Article MathSciNet MATH Google Scholar
Cao, S., Ma, L., Xiao, W., Zhang, C., et al.: Seernet: predicting convolutional neural network feature-map sparsity through low-bit quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11216–11225 (2019)
Google Scholar
Srivastava, N., Hinton, G., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. Adv. Neural. Inf. Process. Syst. 28, 1135–1143 (2015)
Google Scholar
Albericio, J., Judd, P., Hetherington, T., et al.: Cnvlutin: Ineffectual-neuron-free deep neural network computing. ACM SIGARCH Comput. Arch. News 44(3), 1–13 (2016)
Article Google Scholar
Gupta, U., Reagen, B., Pentecost, L., Donato, M., et al.: Masr: a modular accelerator for sparse rnns. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 1–14. IEEE (2019)
Google Scholar
Wang, H., Zhang, Z., Han, S.: Spatten: efficient sparse attention architecture with cascade token and head pruning. In: HPCA2021, pp. 97–110. IEEE (2021)
Google Scholar
Yazdanbakhsh, A., Samadi, K., Kim, N.S., et al.: Ganax: a unified mimd-simd acceleration for generative adversarial networks. In: ISCA2018, pp. 650–661. IEEE (2018)
Google Scholar
Horowitz, M.: 1.1 computing's energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. IEEE (2014)
Google Scholar
Chakrabarty, A., Collier, M., Mukhopadhyay, S.: Matrix-based nonblocking routing algorithm for Beneš networks. In: 2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, pp. 551–556. IEEE (2009)
Google Scholar
Kwon, H., Samajdar, A., et al.: Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Notices 53(2), 461–475 (2018)
Article Google Scholar
Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Google Scholar
Zhang, S., Du, Z., Zhang, L., et al.: Cambricon-X: an accelerator for sparse neural networks. In: MICRO2016, pp. 1–12. IEEE (2016)
Google Scholar
Parashar, A., Rhu, M., et al.: SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45(2), 27–40 (2017)
Article Google Scholar
Gondimalla, A., Chesnut, N., Thottethodi, M., et al.: Sparten: a sparse tensor accelerator for convolutional neural networks. In: MICRO2019, pp. 151–165 (2019)
Google Scholar
Han, S., Liu, X., Mao, H., et al.: EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Arch. News 44(3), 243–254 (2016)
Article Google Scholar
Hegde, K., Asghari-Moghaddam, H., Pellauer, M., et al.: Extensor: an accelerator for sparse tensor algebra. In: MICRO2019, pp. 319–333 (2019)
Google Scholar
Chen, Y.H., Yang, T.J., Emer, J., et al.: Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)
Article Google Scholar
Lu, W., Yan, G., Li, J., et al.: Flexflow: a flexible dataflow accelerator architecture for convolutional neural network. In: HPCA2017, pp. 553–564. IEEE (2017)
Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Key R&D Project No.2021YFB0300300, the NSFC (62172430, 61872374, 62272476), the NSF of Hunan Province (2021JJ10052, 2022JJ10064).

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Bo Wang, Sheng Ma, Yuan Yuan, Yi Dai, Wei Jiang, Xiang Hou, Xiao Yi & Rui Xu

Authors

Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Dai
View author publications
You can also search for this author in PubMed Google Scholar
Wei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Yi
View author publications
You can also search for this author in PubMed Google Scholar
Rui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bo Wang or Sheng Ma .

Editor information

Editors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu
University of Exeter, Exeter, UK
Geyong Min
Rutgers University, Newark, NJ, USA
Jaideep Vaidya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B. et al. (2023). SparG: A Sparse GEMM Accelerator for Deep Learning Applications. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-22677-9_28
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics