Skip to main content

MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

  • 1920 Accesses

Abstract

Graph analytics is increasingly important for solving problems in various fields. Matrix-based graph analytics has obtained much attention due to its high performance and ease of optimization. In the general architecture, due to the extremely high sparsity and complex connectedness of graphs, matrix-based graph analytics suffers from the deep and heavy pipeline as well as the low efficiency of the memory subsystem. Meanwhile, lots of accelerators based on application-specific integrated circuits (ASICs) for graph analytics are not flexible enough to support various matrix operations of diverse matrix-based graph algorithms, which have different graph semantics and dataflow.

In this paper, we present MatGraph, an energy-efficient and flexible architecture to support matrix-based graph analytics efficiently. MatGraph is based on coarse-grained reconfigurable architectures (CGRAs) which have both high energy efficiency and flexibility. According to the matrix operations on graphs, we conduct an abstract from the operators to define reduced instructions and design a lightweight pipeline to achieve high parallelism of instructions in CGRAs. To eliminate the impact of the highly sparse graph data, we design a bitmap-aware instruction filtering unit to filter out invalid instructions for each PE and increase the on-chip reuse of instructions. Furthermore, we propose a bidirectional data-aware sparsity removing scheme to eliminate the sparsity and redundant off-chip data accesses. Overall, MatGraph achieves 9.35x, 2.28x speedup, and 11.17x, 7.15x energy savings on average compared to state-of-the-art (SOTA) CPU-based and GPGPU-based solutions respectively. Compared to the SOTA graph analytics accelerator, MatGraph also achieves 1.59x speedup and 1.61x less energy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahn, J., Hong, S., Yoo, S., Mutlu, O., Choi, K.: A scalable processing-in-memory accelerator for parallel graph processing. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 105–117 (2015)

    Google Scholar 

  2. Association, J.S.S.T., et al.: Jedec standard: Ddr4 sdram. JESD79-4, September 2012

    Google Scholar 

  3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009)

    Article  Google Scholar 

  4. Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)

    Article  Google Scholar 

  5. Carter, N.P., et al.: Runnemede: an architecture for ubiquitous high-performance computing. In: 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), pp. 198–209. IEEE (2013)

    Google Scholar 

  6. Chen, D.C., Rabaey, J.M.: A reconfigurable multiprocessor IC for rapid prototyping of algorithmic-specific high-speed DSP data paths. IEEE J. Solid-State Circuits 27(12), 1895–1904 (1992)

    Article  Google Scholar 

  7. Dai, G., Huang, T., Chi, Y., Xu, N., Wang, Y., Yang, H.: Foregraph: exploring large-scale graph processing on multi-FPGA architecture. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 217–226 (2017)

    Google Scholar 

  8. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)

    MathSciNet  MATH  Google Scholar 

  9. DeHon, A.: Fundamental underpinnings of reconfigurable computing architectures. Proc. IEEE 103(3), 355–378 (2015)

    Article  Google Scholar 

  10. Estrin, G.: Organization of computer systems: the fixed plus variable structure computer. In: Papers Presented at the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference, pp. 33–40 (1960)

    Google Scholar 

  11. Gao, G.R., Suetterlein, J., Zuckerman, S.: Toward an execution model for extreme-scale systems-runnemede and beyond. Technical Memo (2011)

    Google Scholar 

  12. Giorgi, R., et al.: Teraflux: harnessing dataflow in next generation teradevices. Microprocess. Microsyst. 38(8), 976–990 (2014)

    Article  Google Scholar 

  13. Ham, T.J., Wu, L., Sundaram, N., Satish, N., Martonosi, M.: Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–13. IEEE (2016)

    Google Scholar 

  14. Hartenstein, R.W., Hirschbiel, A.G., Riedmuller, M., Schmidt, K., Weber, M.: A novel ASIC design approach based on a new machine paradigm. IEEE J. Solid-State Circuits 26(7), 975–989 (1991)

    Article  Google Scholar 

  15. Huang, G., Dai, G., Wang, Y., Yang, H.: Ge-SPMM: general-purpose sparse matrix-matrix multiplication on GPUs for graph neural networks. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE (2020)

    Google Scholar 

  16. Ideker, T., Ozier, O., Schwikowski, B., Siegel, A.F.: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(suppl_1), S233–S240 (2002)

    Google Scholar 

  17. Kim, Y., Yang, W., Mutlu, O.: Ramulator: a fast and extensible dram simulator. IEEE Comput. Archit. Lett. 15(1), 45–49 (2015)

    Article  Google Scholar 

  18. König, D.: Graphen und matrizen, mat. Lapok 38, 116–119 (1931)

    MATH  Google Scholar 

  19. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)

    Google Scholar 

  20. Kyrola, A., Blelloch, G., Guestrin, C.: Graphchi: large-scale graph computation on just a \(\{\)PC\(\}\). In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp. 31–46 (2012)

    Google Scholar 

  21. Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)

    Google Scholar 

  22. Mattson, T., et al.: Standards for graph algorithm primitives. In: 2013 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–2. IEEE (2013)

    Google Scholar 

  23. Mattson, T., et al.: Lagraph: a community effort to collect graph algorithms built on top of the graphblas. In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 276–284. IEEE (2019)

    Google Scholar 

  24. Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Cacti 6.0: a tool to model large caches. HP laboratories 27, 28 (2009)

    Google Scholar 

  25. Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray Users Group (CUG) 19, 45–74 (2010)

    Google Scholar 

  26. Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)

    Article  Google Scholar 

  27. Rahman, S., Abu-Ghazaleh, N., Gupta, R.: Graphpulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)

    Google Scholar 

  28. Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472–488 (2013)

    Google Scholar 

  29. Satish, N., et al.: Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 979–990 (2014)

    Google Scholar 

  30. Song, L., Zhuo, Y., Qian, X., Li, H., Chen, Y.: GrapHR: accelerating graph processing using reram. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 531–543. IEEE (2018)

    Google Scholar 

  31. Sundaram, N., et al.: GraphMAT: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)

  32. Tessier, R., Pocek, K., DeHon, A.: Reconfigurable computing architectures. Proc. IEEE 103(3), 332–354 (2015)

    Article  Google Scholar 

  33. Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12 (2016)

    Google Scholar 

  34. Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)

    Google Scholar 

  35. Yang, C., Buluc, A., Owens, J.D.: Graphblast: A high-performance linear algebra-based graph framework on the GPU. arXiv preprint arXiv:1908.01407 (2019)

Download references

Acknowledgements

This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyu Yan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, L., Yan, M., Wang, D., Li, W., Ye, X., Fan, D. (2023). MatGraph: An Energy-Efficient and Flexible CGRA Engine for Matrix-Based Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22677-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22676-2

  • Online ISBN: 978-3-031-22677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics