Abstract
Graph analytics is increasingly important for solving problems in various fields. Matrix-based graph analytics has obtained much attention due to its high performance and ease of optimization. In the general architecture, due to the extremely high sparsity and complex connectedness of graphs, matrix-based graph analytics suffers from the deep and heavy pipeline as well as the low efficiency of the memory subsystem. Meanwhile, lots of accelerators based on application-specific integrated circuits (ASICs) for graph analytics are not flexible enough to support various matrix operations of diverse matrix-based graph algorithms, which have different graph semantics and dataflow.
In this paper, we present MatGraph, an energy-efficient and flexible architecture to support matrix-based graph analytics efficiently. MatGraph is based on coarse-grained reconfigurable architectures (CGRAs) which have both high energy efficiency and flexibility. According to the matrix operations on graphs, we conduct an abstract from the operators to define reduced instructions and design a lightweight pipeline to achieve high parallelism of instructions in CGRAs. To eliminate the impact of the highly sparse graph data, we design a bitmap-aware instruction filtering unit to filter out invalid instructions for each PE and increase the on-chip reuse of instructions. Furthermore, we propose a bidirectional data-aware sparsity removing scheme to eliminate the sparsity and redundant off-chip data accesses. Overall, MatGraph achieves 9.35x, 2.28x speedup, and 11.17x, 7.15x energy savings on average compared to state-of-the-art (SOTA) CPU-based and GPGPU-based solutions respectively. Compared to the SOTA graph analytics accelerator, MatGraph also achieves 1.59x speedup and 1.61x less energy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 105–117 (2015)
Association, J.S.S.T., et al.: Jedec standard: Ddr4 sdram. JESD79-4, September 2012
Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)
Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
Carter, N.P., et al.: Runnemede: an architecture for ubiquitous high-performance computing. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 198–209. IEEE (2013)
Chen, D.C., Rabaey, J.M.: A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths. IEEE J. Solid-State Circuits 27(12), 1895–1904 (1992)
Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 217–226 (2017)
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
DeHon, A.: Fundamental underpinnings of reconfigurable computing architectures. Proc. IEEE 103(3), 355–378 (2015)
Estrin, G.: Organization of computer systems: the fixed plus variable structure computer. In: Papers Presented at the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference, pp. 33–40 (1960)
Gao, G.R., Suetterlein, J., Zuckerman, S.: Toward an execution model for extreme-scale systems-runnemede and beyond. Technical Memo (2011)
Giorgi, R., et al.: Teraflux: harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8), 976–990 (2014)
Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)
Hartenstein, R.W., Hirschbiel, A.G., Riedmuller, M., Schmidt, K., Weber, M.: A novel ASIC design approach based on a new machine paradigm. IEEE J. Solid-State Circuits 26(7), 975–989 (1991)
Huang, G., Dai, G., Wang, Y., Yang, H.: Ge-SPMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2020)
Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl_1), S233–S240 (2002)
Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible dram simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2015)
König, D.: Graphen und matrizen, mat. Lapok 38, 116–119 (1931)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a \(\{\)PC\(\}\). In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 31–46 (2012)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
Mattson, T., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2. IEEE (2013)
Mattson, T., et al.: Lagraph: a community effort to collect graph algorithms built on top of the graphblas. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 276–284. IEEE (2019)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Cacti 6.0: a tool to model large caches. HP laboratories 27, 28 (2009)
Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)
Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488 (2013)
Satish, N., et al.: Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 979–990 (2014)
Song, L., Zhuo, Y., Qian, X., Li, H., Chen, Y.: GrapHR: accelerating graph processing using reram. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 531–543. IEEE (2018)
Sundaram, N., et al.: GraphMAT: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Tessier, R., Pocek, K., DeHon, A.: Reconfigurable computing architectures. Proc. IEEE 103(3), 332–354 (2015)
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12 (2016)
Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)
Yang, C., Buluc, A., Owens, J.D.: Graphblast: A high-performance linear algebra-based graph framework on the GPU. arXiv preprint arXiv:1908.01407 (2019)
Acknowledgements
This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, L., Yan, M., Wang, D., Li, W., Ye, X., Fan, D. (2023). MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-22677-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)