Skip to main content

GEM: Execution-Aware Cache Management for Graph Analytics

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

  • 1474 Accesses

Abstract

Graph analytics plays a significant role in various application domains. However, the performance of graph analytics is limited by the inefficiencies of the cache hierarchy. In recent years, plenty of works focus on eliminating the irregular data accesses to accelerate graph analytics. However, we find that even regular data accesses cannot fully utilize cache hierarchy because current cache management is independent of execution characteristics. To this end, we propose GEM, a Graph-specialized Execution-aware cache Management at the L1D cache. GEM perceives from execution patterns in graph analytics and exploits customized cache management for regular data accesses without any pre-processing phase. More specifically, GEM identifies when the regular data accesses will occur and employs a length-aware fetch and reuse-aware replacement accordingly. We implement GEM on a popular multi-core simulator and evaluate the performance on various algorithms using several large real-world graphs. The result shows that GEM outperforms the state-of-the-art graph-specialized cache management by \(21.1\%\) on average and up to \(44.5\%\) in the best case, with up to \(66\%\) reduction of expensive off-chip memory accesses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    source code from https://github.com/ChampSim/ChampSim/blob/master/replacement/drrip.llc_repl.

  2. 2.

    source code from https://github.com/faldupriyank/grasp.

References

  1. Andreou, A., Silva, M., Benevenuto, F., Goga, O., Loiseau, P., Mislove, A.: Measuring the facebook advertising ecosystem. In: NDSS 2019-Proceedings of the Network and Distributed System Security Symposium, pp. 1–15 (2019)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  3. Balaji, V., Crago, N., Jaleel, A., Lucia, B.: P-OPT: practical optimal cache replacement for graph analytics. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 668–681. IEEE (2021)

    Google Scholar 

  4. Basak, A., et al.: Analysis and optimization of the memory hierarchy for graph processing workloads. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–386. IEEE (2019)

    Google Scholar 

  5. Beamer, S., Asanovic, K., Patterson, D.: Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–10. IEEE (2012)

    Google Scholar 

  6. Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. arXiv preprint arXiv:1508.03619 (2015)

  7. Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)

    Article  MATH  Google Scholar 

  8. Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V., Smola, A.J.: Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1048–1058 (2009)

    Article  Google Scholar 

  9. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in twitter: The million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 4 (2010)

    Google Scholar 

  10. Collins, J.D., Tullsen, D.M.: Hardware identification of cache conflict misses. In: MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 126–135. IEEE (1999)

    Google Scholar 

  11. Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Dehghani, M., Tumer, M.: A research on effectiveness of facebook advertising on enhancing purchase intention of consumers. Comput. Hum. Behav. 49, 597–600 (2015)

    Article  Google Scholar 

  13. Faldu, P., Diamond, J., Grot, B.: A closer look at lightweight graph reordering. In: 2019 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–13. IEEE (2019)

    Google Scholar 

  14. Faldu, P., Diamond, J., Grot, B.: Domain-specialized cache management for graph analytics. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 234–248. IEEE (2020)

    Google Scholar 

  15. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. ACM SIGCOMM Comput. Commun. Rev. 29(4), 251–262 (1999)

    Article  MATH  Google Scholar 

  16. Fan, W.: Graph pattern matching revised for social network analysis. In: Proceedings of the 15th International Conference on Database Theory, pp. 8–21 (2012)

    Google Scholar 

  17. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: \(\{\)PowerGraph\(\}\): Distributed \(\{\)Graph-Parallel\(\}\) computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012), pp. 17–30 (2012)

    Google Scholar 

  18. Gupta, S., Gao, H., Zhou, H.: Adaptive cache bypassing for inclusive last level caches. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1243–1253. IEEE (2013)

    Google Scholar 

  19. Hassan, H., et al.: Crow: A low-cost substrate for improving dram performance, energy efficiency, and reliability. In: Proceedings of the 46th International Symposium on Computer Architecture, pp. 129–142 (2019)

    Google Scholar 

  20. Ho, J.C.T.: How biased is the sample? Reverse engineering the ranking algorithm of facebook’s graph application programming interface. Big Data Soc. 7(1), 2053951720905874 (2020)

    Article  MathSciNet  Google Scholar 

  21. Jaleel, A., Theobald, K.B., Steely, S.C., Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Comput. Archit. News 38(3), 60–71 (2010)

    Article  Google Scholar 

  22. John, L.K., Subramanian, A.: Design and performance evaluation of a cache assist to implement selective caching. In: Proceedings International Conference on Computer Design VLSI in Computers and Processors, pp. 510–518. IEEE (1997)

    Google Scholar 

  23. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)

    Google Scholar 

  24. Lehmberg, O., Meusel, R., Bizer, C.: Graph structure in the web: aggregated by pay-level domain. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 119–128 (2014)

    Google Scholar 

  25. Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (Jun 2014)

  26. Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable dram simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)

    Article  Google Scholar 

  27. Lovrics, A., et al.: Boolean modelling reveals new regulatory connections between transcription factors orchestrating the development of the ventral spinal cord. PLoS ONE 9(11), e111430 (2014)

    Article  Google Scholar 

  28. Maass, S., Min, C., Kashyap, S., Kang, W., Kumar, M., Kim, T.: Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, pp. 527–543 (2017)

    Google Scholar 

  29. Madduri, K., Ediger, D., Jiang, K., Bader, D.A., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2009)

    Google Scholar 

  30. Malkowski, K., Link, G., Raghavan, P., Irwin, M.J.: Load miss prediction-exploiting power performance trade-offs. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)

    Google Scholar 

  31. Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42 (2007)

    Google Scholar 

  32. Mukkara, A., Beckmann, N., Abeydeera, M., Ma, X., Sanchez, D.: Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–14. IEEE (2018)

    Google Scholar 

  33. Mukkara, A., Beckmann, N., Sanchez, D.: PHI: architectural support for synchronization-and bandwidth-efficient commutative scatter updates. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1009–1022 (2019)

    Google Scholar 

  34. Navlakha, S., Schatz, M.C., Kingsford, C.: Revealing biological modules via graph summarization. J. Comput. Biol. 16(2), 253–264 (2009)

    Article  Google Scholar 

  35. Rahman, S., Abu-Ghazaleh, N., Gupta, R.: GraphPulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)

    Google Scholar 

  36. Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. ACM SIGARCH Comput. Archit. News 28(2), 128–138 (2000)

    Article  Google Scholar 

  37. Sanchez, D., Kozyrakis, C.: ZSim: fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Comput. Archit. News 41(3), 475–486 (2013)

    Article  Google Scholar 

  38. Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 135–146 (2013)

    Google Scholar 

  39. Spiliotopoulos, T., Pereira, D., Oakley, I.: Predicting tie strength with the facebook API. In: Proceedings of the 18th Panhellenic Conference on Informatics, pp. 1–5 (2014)

    Google Scholar 

  40. Sundaram, N., et al.: GraphMat: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)

  41. Sutton, M., Ben-Nun, T., Barak, A.: Optimizing parallel graph connectivity computation via subgraph sampling. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 12–21. IEEE (2018)

    Google Scholar 

  42. Talati, N., et al.: Prodigy: improving the memory latency of data-indirect irregular workloads using hardware-software co-design. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 654–667. IEEE (2021)

    Google Scholar 

  43. Tang, L., Liu, H.: Graph mining applications to social network analysis. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40. Springer, Boston (2010)

    Chapter  Google Scholar 

  44. Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)

    Google Scholar 

  45. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  46. Zhang, D., Ma, X., Thomson, M., Chiou, D.: Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching. ACM SIGPLAN Not. 53(2), 593–607 (2018)

    Article  Google Scholar 

  47. Zhang, Y., et al.: DepGraph: a dependency-driven accelerator for efficient iterative graph processing. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 371–384. IEEE (2021)

    Google Scholar 

  48. Zhang, Y., et al.: Optimizing ordered graph algorithms with graphit. arXiv preprint arXiv:1911.07260 (2019)

  49. Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., Zaharia, M.: Making caches work for graph analytics. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 293–302. IEEE (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mo Zou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zou, M., Yan, M., Li, W., Tang, Z., Ye, X., Fan, D. (2023). GEM: Execution-Aware Cache Management for Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22677-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22676-2

  • Online ISBN: 978-3-031-22677-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics