Abstract
Graph analytics plays a significant role in various application domains. However, the performance of graph analytics is limited by the inefficiencies of the cache hierarchy. In recent years, plenty of works focus on eliminating the irregular data accesses to accelerate graph analytics. However, we find that even regular data accesses cannot fully utilize cache hierarchy because current cache management is independent of execution characteristics. To this end, we propose GEM, a Graph-specialized Execution-aware cache Management at the L1D cache. GEM perceives from execution patterns in graph analytics and exploits customized cache management for regular data accesses without any pre-processing phase. More specifically, GEM identifies when the regular data accesses will occur and employs a length-aware fetch and reuse-aware replacement accordingly. We implement GEM on a popular multi-core simulator and evaluate the performance on various algorithms using several large real-world graphs. The result shows that GEM outperforms the state-of-the-art graph-specialized cache management by \(21.1\%\) on average and up to \(44.5\%\) in the best case, with up to \(66\%\) reduction of expensive off-chip memory accesses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
source code from https://github.com/faldupriyank/grasp.
References
Andreou, A., Silva, M., Benevenuto, F., Goga, O., Loiseau, P., Mislove, A.: Measuring the facebook advertising ecosystem. In: NDSS 2019-Proceedings of the Network and Distributed System Security Symposium, pp. 1–15 (2019)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Balaji, V., Crago, N., Jaleel, A., Lucia, B.: P-OPT: practical optimal cache replacement for graph analytics. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 668–681. IEEE (2021)
Basak, A., et al.: Analysis and optimization of the memory hierarchy for graph processing workloads. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–386. IEEE (2019)
Beamer, S., Asanovic, K., Patterson, D.: Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–10. IEEE (2012)
Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. arXiv preprint arXiv:1508.03619 (2015)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V., Smola, A.J.: Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1048–1058 (2009)
Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in twitter: The million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 4 (2010)
Collins, J.D., Tullsen, D.M.: Hardware identification of cache conflict misses. In: MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 126–135. IEEE (1999)
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
Dehghani, M., Tumer, M.: A research on effectiveness of facebook advertising on enhancing purchase intention of consumers. Comput. Hum. Behav. 49, 597–600 (2015)
Faldu, P., Diamond, J., Grot, B.: A closer look at lightweight graph reordering. In: 2019 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–13. IEEE (2019)
Faldu, P., Diamond, J., Grot, B.: Domain-specialized cache management for graph analytics. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 234–248. IEEE (2020)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. ACM SIGCOMM Comput. Commun. Rev. 29(4), 251–262 (1999)
Fan, W.: Graph pattern matching revised for social network analysis. In: Proceedings of the 15th International Conference on Database Theory, pp. 8–21 (2012)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: \(\{\)PowerGraph\(\}\): Distributed \(\{\)Graph-Parallel\(\}\) computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012), pp. 17–30 (2012)
Gupta, S., Gao, H., Zhou, H.: Adaptive cache bypassing for inclusive last level caches. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1243–1253. IEEE (2013)
Hassan, H., et al.: Crow: A low-cost substrate for improving dram performance, energy efficiency, and reliability. In: Proceedings of the 46th International Symposium on Computer Architecture, pp. 129–142 (2019)
Ho, J.C.T.: How biased is the sample? Reverse engineering the ranking algorithm of facebook’s graph application programming interface. Big Data Soc. 7(1), 2053951720905874 (2020)
Jaleel, A., Theobald, K.B., Steely, S.C., Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Comput. Archit. News 38(3), 60–71 (2010)
John, L.K., Subramanian, A.: Design and performance evaluation of a cache assist to implement selective caching. In: Proceedings International Conference on Computer Design VLSI in Computers and Processors, pp. 510–518. IEEE (1997)
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Lehmberg, O., Meusel, R., Bizer, C.: Graph structure in the web: aggregated by pay-level domain. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 119–128 (2014)
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (Jun 2014)
Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable dram simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)
Lovrics, A., et al.: Boolean modelling reveals new regulatory connections between transcription factors orchestrating the development of the ventral spinal cord. PLoS ONE 9(11), e111430 (2014)
Maass, S., Min, C., Kashyap, S., Kang, W., Kumar, M., Kim, T.: Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, pp. 527–543 (2017)
Madduri, K., Ediger, D., Jiang, K., Bader, D.A., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2009)
Malkowski, K., Link, G., Raghavan, P., Irwin, M.J.: Load miss prediction-exploiting power performance trade-offs. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42 (2007)
Mukkara, A., Beckmann, N., Abeydeera, M., Ma, X., Sanchez, D.: Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–14. IEEE (2018)
Mukkara, A., Beckmann, N., Sanchez, D.: PHI: architectural support for synchronization-and bandwidth-efficient commutative scatter updates. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1009–1022 (2019)
Navlakha, S., Schatz, M.C., Kingsford, C.: Revealing biological modules via graph summarization. J. Comput. Biol. 16(2), 253–264 (2009)
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: GraphPulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. ACM SIGARCH Comput. Archit. News 28(2), 128–138 (2000)
Sanchez, D., Kozyrakis, C.: ZSim: fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Comput. Archit. News 41(3), 475–486 (2013)
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 135–146 (2013)
Spiliotopoulos, T., Pereira, D., Oakley, I.: Predicting tie strength with the facebook API. In: Proceedings of the 18th Panhellenic Conference on Informatics, pp. 1–5 (2014)
Sundaram, N., et al.: GraphMat: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Sutton, M., Ben-Nun, T., Barak, A.: Optimizing parallel graph connectivity computation via subgraph sampling. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 12–21. IEEE (2018)
Talati, N., et al.: Prodigy: improving the memory latency of data-indirect irregular workloads using hardware-software co-design. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 654–667. IEEE (2021)
Tang, L., Liu, H.: Graph mining applications to social network analysis. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40. Springer, Boston (2010)
Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Zhang, D., Ma, X., Thomson, M., Chiou, D.: Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching. ACM SIGPLAN Not. 53(2), 593–607 (2018)
Zhang, Y., et al.: DepGraph: a dependency-driven accelerator for efficient iterative graph processing. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 371–384. IEEE (2021)
Zhang, Y., et al.: Optimizing ordered graph algorithms with graphit. arXiv preprint arXiv:1911.07260 (2019)
Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., Zaharia, M.: Making caches work for graph analytics. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 293–302. IEEE (2017)
Acknowledgements
This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Zou, M., Yan, M., Li, W., Tang, Z., Ye, X., Fan, D. (2023). GEM: Execution-Aware Cache Management for Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-22677-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)