GEM: Execution-Aware Cache Management for Graph Analytics

Zou, Mo; Yan, Mingyu; Li, Wenming; Tang, Zhimin; Ye, Xiaochun; Fan, Dongrui

doi:10.1007/978-3-031-22677-9_15

Mo Zou^11,12,
Mingyu Yan¹¹,
Wenming Li¹¹,
Zhimin Tang^11,12,
Xiaochun Ye¹¹ &
…
Dongrui Fan¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13777))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1474 Accesses

Abstract

Graph analytics plays a significant role in various application domains. However, the performance of graph analytics is limited by the inefficiencies of the cache hierarchy. In recent years, plenty of works focus on eliminating the irregular data accesses to accelerate graph analytics. However, we find that even regular data accesses cannot fully utilize cache hierarchy because current cache management is independent of execution characteristics. To this end, we propose GEM, a Graph-specialized Execution-aware cache Management at the L1D cache. GEM perceives from execution patterns in graph analytics and exploits customized cache management for regular data accesses without any pre-processing phase. More specifically, GEM identifies when the regular data accesses will occur and employs a length-aware fetch and reuse-aware replacement accordingly. We implement GEM on a popular multi-core simulator and evaluate the performance on various algorithms using several large real-world graphs. The result shows that GEM outperforms the state-of-the-art graph-specialized cache management by \(21.1\%\) on average and up to \(44.5\%\) in the best case, with up to \(66\%\) reduction of expensive off-chip memory accesses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
source code from https://github.com/ChampSim/ChampSim/blob/master/replacement/drrip.llc_repl.
2.
source code from https://github.com/faldupriyank/grasp.

References

Andreou, A., Silva, M., Benevenuto, F., Goga, O., Loiseau, P., Mislove, A.: Measuring the facebook advertising ecosystem. In: NDSS 2019-Proceedings of the Network and Distributed System Security Symposium, pp. 1–15 (2019)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K. (ed.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Balaji, V., Crago, N., Jaleel, A., Lucia, B.: P-OPT: practical optimal cache replacement for graph analytics. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 668–681. IEEE (2021)
Google Scholar
Basak, A., et al.: Analysis and optimization of the memory hierarchy for graph processing workloads. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 373–386. IEEE (2019)
Google Scholar
Beamer, S., Asanovic, K., Patterson, D.: Direction-optimizing breadth-first search. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–10. IEEE (2012)
Google Scholar
Beamer, S., Asanović, K., Patterson, D.: The gap benchmark suite. arXiv preprint arXiv:1508.03619 (2015)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Article MATH Google Scholar
Caetano, T.S., McAuley, J.J., Cheng, L., Le, Q.V., Smola, A.J.: Learning graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1048–1058 (2009)
Article Google Scholar
Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in twitter: The million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 4 (2010)
Google Scholar
Collins, J.D., Tullsen, D.M.: Hardware identification of cache conflict misses. In: MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, pp. 126–135. IEEE (1999)
Google Scholar
Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
MathSciNet MATH Google Scholar
Dehghani, M., Tumer, M.: A research on effectiveness of facebook advertising on enhancing purchase intention of consumers. Comput. Hum. Behav. 49, 597–600 (2015)
Article Google Scholar
Faldu, P., Diamond, J., Grot, B.: A closer look at lightweight graph reordering. In: 2019 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–13. IEEE (2019)
Google Scholar
Faldu, P., Diamond, J., Grot, B.: Domain-specialized cache management for graph analytics. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 234–248. IEEE (2020)
Google Scholar
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. ACM SIGCOMM Comput. Commun. Rev. 29(4), 251–262 (1999)
Article MATH Google Scholar
Fan, W.: Graph pattern matching revised for social network analysis. In: Proceedings of the 15th International Conference on Database Theory, pp. 8–21 (2012)
Google Scholar
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: \(\{\)PowerGraph\(\}\): Distributed \(\{\)Graph-Parallel\(\}\) computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2012), pp. 17–30 (2012)
Google Scholar
Gupta, S., Gao, H., Zhou, H.: Adaptive cache bypassing for inclusive last level caches. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1243–1253. IEEE (2013)
Google Scholar
Hassan, H., et al.: Crow: A low-cost substrate for improving dram performance, energy efficiency, and reliability. In: Proceedings of the 46th International Symposium on Computer Architecture, pp. 129–142 (2019)
Google Scholar
Ho, J.C.T.: How biased is the sample? Reverse engineering the ranking algorithm of facebook’s graph application programming interface. Big Data Soc. 7(1), 2053951720905874 (2020)
Article MathSciNet Google Scholar
Jaleel, A., Theobald, K.B., Steely, S.C., Jr., Emer, J.: High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Comput. Archit. News 38(3), 60–71 (2010)
Article Google Scholar
John, L.K., Subramanian, A.: Design and performance evaluation of a cache assist to implement selective caching. In: Proceedings International Conference on Computer Design VLSI in Computers and Processors, pp. 510–518. IEEE (1997)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
Google Scholar
Lehmberg, O., Meusel, R., Bizer, C.: Graph structure in the web: aggregated by pay-level domain. In: Proceedings of the 2014 ACM Conference on Web Science, pp. 119–128 (2014)
Google Scholar
Leskovec, J., Krevl, A.: SNAP Datasets: stanford large network dataset collection. http://snap.stanford.edu/data (Jun 2014)
Li, S., Yang, Z., Reddy, D., Srivastava, A., Jacob, B.: DRAMsim3: a cycle-accurate, thermal-capable dram simulator. IEEE Comput. Archit. Lett. 19(2), 106–109 (2020)
Article Google Scholar
Lovrics, A., et al.: Boolean modelling reveals new regulatory connections between transcription factors orchestrating the development of the ventral spinal cord. PLoS ONE 9(11), e111430 (2014)
Article Google Scholar
Maass, S., Min, C., Kashyap, S., Kang, W., Kumar, M., Kim, T.: Mosaic: processing a trillion-edge graph on a single machine. In: Proceedings of the Twelfth European Conference on Computer Systems, pp. 527–543 (2017)
Google Scholar
Madduri, K., Ediger, D., Jiang, K., Bader, D.A., Chavarria-Miranda, D.: A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In: 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2009)
Google Scholar
Malkowski, K., Link, G., Raghavan, P., Irwin, M.J.: Load miss prediction-exploiting power performance trade-offs. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Google Scholar
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42 (2007)
Google Scholar
Mukkara, A., Beckmann, N., Abeydeera, M., Ma, X., Sanchez, D.: Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–14. IEEE (2018)
Google Scholar
Mukkara, A., Beckmann, N., Sanchez, D.: PHI: architectural support for synchronization-and bandwidth-efficient commutative scatter updates. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1009–1022 (2019)
Google Scholar
Navlakha, S., Schatz, M.C., Kingsford, C.: Revealing biological modules via graph summarization. J. Comput. Biol. 16(2), 253–264 (2009)
Article Google Scholar
Rahman, S., Abu-Ghazaleh, N., Gupta, R.: GraphPulse: an event-driven hardware accelerator for asynchronous graph processing. In: 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 908–921. IEEE (2020)
Google Scholar
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. ACM SIGARCH Comput. Archit. News 28(2), 128–138 (2000)
Article Google Scholar
Sanchez, D., Kozyrakis, C.: ZSim: fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Comput. Archit. News 41(3), 475–486 (2013)
Article Google Scholar
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 135–146 (2013)
Google Scholar
Spiliotopoulos, T., Pereira, D., Oakley, I.: Predicting tie strength with the facebook API. In: Proceedings of the 18th Panhellenic Conference on Informatics, pp. 1–5 (2014)
Google Scholar
Sundaram, N., et al.: GraphMat: high performance graph analytics made productive. arXiv preprint arXiv:1503.07241 (2015)
Sutton, M., Ben-Nun, T., Barak, A.: Optimizing parallel graph connectivity computation via subgraph sampling. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 12–21. IEEE (2018)
Google Scholar
Talati, N., et al.: Prodigy: improving the memory latency of data-indirect irregular workloads using hardware-software co-design. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 654–667. IEEE (2021)
Google Scholar
Tang, L., Liu, H.: Graph mining applications to social network analysis. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40. Springer, Boston (2010)
Chapter Google Scholar
Yan, M., et al.: Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 615–628 (2019)
Google Scholar
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Article Google Scholar
Zhang, D., Ma, X., Thomson, M., Chiou, D.: Minnow: Lightweight offload engines for worklist management and worklist-directed prefetching. ACM SIGPLAN Not. 53(2), 593–607 (2018)
Article Google Scholar
Zhang, Y., et al.: DepGraph: a dependency-driven accelerator for efficient iterative graph processing. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 371–384. IEEE (2021)
Google Scholar
Zhang, Y., et al.: Optimizing ordered graph algorithms with graphit. arXiv preprint arXiv:1911.07260 (2019)
Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., Zaharia, M.: Making caches work for graph analytics. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 293–302. IEEE (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by CAS Project for Young Scientists in Basic Research (Grant No. YSBR-029), the National Natural Science Foundation of China (Grant No. 61732018, and 61872335), Austrian-Chinese Cooperative R &D Project (FFG and CAS) (Grant No. 171111KYSB20200002), and CAS Project for Youth Innovation Promotion Association.

Author information

Authors and Affiliations

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China
Mo Zou, Mingyu Yan, Wenming Li, Zhimin Tang, Xiaochun Ye & Dongrui Fan
University of Chinese Academy of Sciences, Beijing, China
Mo Zou & Zhimin Tang

Authors

Mo Zou
View author publications
You can also search for this author in PubMed Google Scholar
Mingyu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wenming Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Ye
View author publications
You can also search for this author in PubMed Google Scholar
Dongrui Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mo Zou .

Editor information

Editors and Affiliations

Technical University of Denmark, Kongens Lyngby, Denmark
Weizhi Meng
University of New Brunswick, Fredericton, NB, Canada
Rongxing Lu
University of Exeter, Exeter, UK
Geyong Min
Rutgers University, Newark, NJ, USA
Jaideep Vaidya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, M., Yan, M., Li, W., Tang, Z., Ye, X., Fan, D. (2023). GEM: Execution-Aware Cache Management for Graph Analytics. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-22677-9_15
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GEM: Execution-Aware Cache Management for Graph Analytics