MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics

Tan, Long; Yan, Mingyu; Wang, Duo; Li, Wenming; Ye, Xiaochun; Fan, Dongrui

doi:10.1007/978-3-031-22677-9_19

Long Tan^11,12,
Mingyu Yan¹¹,
Duo Wang^11,12,
Wenming Li¹¹,
Xiaochun Ye¹¹ &
…
Dongrui Fan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1920 Accesses

Abstract

Graph analytics is increasingly important for solving problems in various fields. Matrix-based graph analytics has obtained much attention due to its high performance and ease of optimization. In the general architecture, due to the extremely high sparsity and complex connectedness of graphs, matrix-based graph analytics suffers from the deep and heavy pipeline as well as the low efficiency of the memory subsystem. Meanwhile, lots of accelerators based on application-specific integrated circuits (ASICs) for graph analytics are not flexible enough to support various matrix operations of diverse matrix-based graph algorithms, which have different graph semantics and dataflow.

In this paper, we present MatGraph, an energy-efficient and flexible architecture to support matrix-based graph analytics efficiently. MatGraph is based on coarse-grained reconfigurable architectures (CGRAs) which have both high energy efficiency and flexibility. According to the matrix operations on graphs, we conduct an abstract from the operators to define reduced instructions and design a lightweight pipeline to achieve high parallelism of instructions in CGRAs. To eliminate the impact of the highly sparse graph data, we design a bitmap-aware instruction filtering unit to filter out invalid instructions for each PE and increase the on-chip reuse of instructions. Furthermore, we propose a bidirectional data-aware sparsity removing scheme to eliminate the sparsity and redundant off-chip data accesses. Overall, MatGraph achieves 9.35x, 2.28x speedup, and 11.17x, 7.15x energy savings on average compared to state-of-the-art (SOTA) CPU-based and GPGPU-based solutions respectively. Compared to the SOTA graph analytics accelerator, MatGraph also achieves 1.59x speedup and 1.61x less energy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An adaptive breadth-first search algorithm on integrated architectures

Article 11 August 2018

GAHLS: an optimized graph analytics based high level synthesis framework

Article Open access 19 December 2023

DRGN: a dynamically reconfigurable accelerator for graph neural networks

Article 13 September 2022

References

Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 105–117 (2015)
Google Scholar
Association, J.S.S.T., et al.: Jedec standard: Ddr4 sdram. JESD79-4, September 2012
Google Scholar
Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
Article Google Scholar
Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
Article Google Scholar
Carter, N.P., et al.: Runnemede: an architecture for ubiquitous high-performance computing. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 198–209. IEEE (2013)
Google Scholar
Chen, D.C., Rabaey, J.M.: A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths. IEEE J. Solid-State Circuits 27(12), 1895–1904 (1992)
Article Google Scholar
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 217–226 (2017)
Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
MathSciNet MATH Google Scholar
DeHon, A.: Fundamental underpinnings of reconfigurable computing architectures. Proc. IEEE 103(3), 355–378 (2015)
Article Google Scholar
Estrin, G.: Organization of computer systems: the fixed plus variable structure computer. In: Papers Presented at the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference, pp. 33–40 (1960)
Google Scholar
Gao, G.R., Suetterlein, J., Zuckerman, S.: Toward an execution model for extreme-scale systems-runnemede and beyond. Technical Memo (2011)
Google Scholar
Giorgi, R., et al.: Teraflux: harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8), 976–990 (2014)
Article Google Scholar
Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)
Google Scholar
Hartenstein, R.W., Hirschbiel, A.G., Riedmuller, M., Schmidt, K., Weber, M.: A novel ASIC design approach based on a new machine paradigm. IEEE J. Solid-State Circuits 26(7), 975–989 (1991)
Article Google Scholar
Huang, G., Dai, G., Wang, Y., Yang, H.: Ge-SPMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2020)
Google Scholar
Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl_1), S233–S240 (2002)
Google Scholar
Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible dram simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2015)
Article Google Scholar
König, D.: Graphen und matrizen, mat. Lapok 38, 116–119 (1931)
MATH Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Google Scholar
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a $\{$PC$\}$. In: 10th $\{$USENIX$\}$ Symposium on Operating Systems Design and Implementation ($\{$OSDI$\}$ 12), pp. 31–46 (2012)
Google Scholar
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
Google Scholar
Mattson, T., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2. IEEE (2013)
Google Scholar
Mattson, T., et al.: Lagraph: a community effort to collect graph algorithms built on top of the graphblas. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 276–284. IEEE (2019)
Google Scholar
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Cacti 6.0: a tool to model large caches. HP laboratories 27, 28 (2009)
Google Scholar
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)
Google Scholar
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)
Google Scholar
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488 (2013)
Google Scholar
Satish, N., et al.: Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 979–990 (2014)
Google Scholar
Song, L., Zhuo, Y., Qian, X., Li, H., Chen, Y.: GrapHR: accelerating graph processing using reram. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 531–543. IEEE (2018)
Google Scholar
Sundaram, N., et al.: GraphMAT: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Tessier, R., Pocek, K., DeHon, A.: Reconfigurable computing architectures. Proc. IEEE 103(3), 332–354 (2015)
Article Google Scholar
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12 (2016)
Google Scholar
Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)
Google Scholar
Yang, C., Buluc, A., Owens, J.D.: Graphblast: A high-performance linear algebra-based graph framework on the GPU. arXiv preprint arXiv:1908.01407 (2019)

Download references

Acknowledgements

This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.

Author information

Authors and Affiliations

State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Long Tan, Mingyu Yan, Duo Wang, Wenming Li, Xiaochun Ye & Dongrui Fan
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Long Tan & Duo Wang

Authors

Long Tan
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Duo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Dongrui Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyu Yan .

Editor information

Editors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu
University of Exeter, Exeter, UK
Geyong Min
Rutgers University, Newark, NJ, USA
Jaideep Vaidya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, L., Yan, M., Wang, D., Li, W., Ye, X., Fan, D. (2023). MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-22677-9_19
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics