Abstract
As an important part of distributed graph computing, graph partitioning has been widely studied. However, the majority of the existing approaches to distributed graph partitioning barely take into consideration the relationship between the partition strategy and the graph algorithm’s characteristics. The current graph partitioning strategies are on the basis of the graph structure, and the performance is diverse as running different graph algorithms. In this paper, considering the characteristics of a graph algorithm, we propose a distributed graph partitioning framework that can decide which partitioning strategy is better based on the program analysis. We also design a triangle-based partitioning strategy which benefits several graph algorithms such as the triangle counting. In addition, to reduce the number of shuffling operations in PageRank algorithm, we redesign the PageRank algorithm which considers the corresponding hash partitioning. The experimental results show that our adaptive graph partitioning framework implemented on Graphx surpasses the original one, especially for the triangle counting and PageRank, no matter on real-world graphs or synthetic graphs.


















Similar content being viewed by others
Data availability statement
All data included in this study are available upon request by contact with the corresponding author.
References
Buluç A, Gilbert JR (2011) The combinatorial blas: design, implementation, and applications. Int J High Perform Comput Appl 25(4):496–509
Cheng R, Hong J, Kyrola A, Miao Y, Weng X, Wu M, Yang F, Zhou L, Zhao F, Chen E (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98
Stutz P, Bernstein A, Cohen W (2010) Signal/collect: graph algorithms for the (semantic) web. In: International Semantic Web Conference, pp 764–780. Springer
Çatalyürek ÜV, Aykanat C (1996) Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Ferreira A, Rolim J, Saad Y, Yang T (eds) Parallel algorithms for irregularly structured problems, pp 75–86. Springer, Berlin
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 135–146
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. arXiv preprint arXiv:1204.6078
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp 17–30
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp 599–613
Chen R, Shi J, Chen Y, Zang B, Guan H, Chen H (2019) Powerlyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans Parall Comput TOPC 5(3):1–39
Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th \(\{\)USENIX\(\}\) Symposium on Networked Systems Design and Implementation (\(\{\)NSDI\(\}\) 12), pp 15–28
Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab
Karypis George, Kumar Vipin (1999) Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300
Schloegel K, Karypis G, Kumar V (2000) Parallel multilevel algorithms for multi-constraint graph partitioning. In: Bode A, Ludwig T, Karl W, Wismüller R (eds) Euro-Par 2000 parallel processing, pp 296–310. Springer, Heidelberg
Faloutsos M, Faloutsos P, Faloutsos C (2011) On power-law relationships of the internet topology. In: The Structure and Dynamics of Networks, pp 195–206. Princeton University Press, Princeton
Newman MEJ (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351
Feige U, Hajiaghayi MT, Lee JR (2008) Improved approximation algorithms for minimum weight vertex separators. SIAM J Comput 38(2):629–657
Zhang Y, Li D, Zhang C, Wang J, Liu L (2017) Grapha: efficient partitioning and storage for distributed graph computation. IEEE Trans Serv Comput 14(1):155–166
Karypis G, Kumar V (1998) Multilevelk-way partitioning scheme for irregular graphs. J Parall Distrib Comput 48(1):96–129
Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1222–1230
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp 333–342
Liu X, Zhou Y, Guan X, Shen C (2017) A feasible graph partition framework for parallel computing of big graph. Knowl Based Syst 134:228–239
Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp 301–316, Savannah, GA, November. USENIX Association
Xie C, Yan L, Li WJ, Zhang Z (2014) Distributed power-law graph computing: theoretical and empirical analysis. Nips 27:1673–1681
Jain N, Liao G, Willke TL (2013) Graphbuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, New York, NY, USA. Association for Computing Machinery
Roshan D, Gurbinder G, Loc H, Hoang-Vu D, Alex B, Nikoli D, Marc S, Keshav P (2018) Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 752–768
Slota GM, Root C, Devine K, Madduri K, Rajamanickam S (2020) Scalable, multi-constraint, complex-objective graph partitioning. IEEE Trans Parall Distrib Syst 31(12):2789–2801
Hoang L, Dathathri R, Gill G, Pingali K (2019) Cusp: a customizable streaming edge partitioner for distributed graph analytics. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 439–450. IEEE
Gill G, Dathathri R, Hoang L, Pingali K (2018) A study of partitioning policies for graph analytics on large-scale distributed platforms. Proceedings of the VLDB Endowment 12(4):321–334
Boman EG, Devine KD, Rajamanickam S (2013) Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–12
Leskovec J, Krevl A(2014) Snap datasets: Stanford large network dataset collection
DIMACS (2006) The 9th dimacs implementation challenge—shortest paths
Dharavath R, Singh AN (2019) Spark’s graphx-based link prediction for social communication using triangle counting. Social Netw Anal Min 9(1):1–12
Tang J, Zhang J, Yao L, Li J (2008) Extraction and mining of an academic social network. In: Proceedings of the 17th International Conference on World Wide Web, pp 1193–1194
Ding Y, Yan S, Zhang Y, Dai W, Dong L (2016) Predicting the attributes of social network users using a graph-based machine learning method. Comput Commun 73:3–11
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Systems 30(1–7):107–117
Alazawi Z, Abdljabar MB, Altowaijri S, Vegni AM, Mehmood R (2012) Icdms: an intelligent cloud based disaster management system for vehicular networks. In: International Workshop on Communication Technologies for Vehicles, pp 40–56. Springer
Tian Y, Mceachin RC, Santos C, States DJ, Patel JM (2007) Saga: a subgraph matching tool for biological graphs. Bioinformatics 23(2):232–239
Somyung O, Ha J, Lee K, Sejong O (2017) Degoviz: an interactive visualization tool for a differentially expressed genes heatmap and gene ontology graph. App Sci 7(6):543
Ying D (2011) Scientific collaboration and endorsement: network analysis of coauthorship and citation networks. J Inform 5(1):187–203
Dave A, Jindal A, Li LE, Xin R, Gonzalez J, Zaharia M (2016) Graphframes: an integrated api for mixing graph and relational queries. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp 1–8
Acknowledgements
This work is supported by Science and Technology Research Project of Hebei Higher Education Institutions [No. QN2020133], and the Natural Science Foundation of Hebei Province of China [No. F2019201361].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhai, X., Zhang, H., Huang, X. et al. Graph partitioning strategies: one size does not fit all. J Supercomput 78, 19272–19295 (2022). https://doi.org/10.1007/s11227-022-04620-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04620-2