Skip to main content
Log in

Graph partitioning strategies: one size does not fit all

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As an important part of distributed graph computing, graph partitioning has been widely studied. However, the majority of the existing approaches to distributed graph partitioning barely take into consideration the relationship between the partition strategy and the graph algorithm’s characteristics. The current graph partitioning strategies are on the basis of the graph structure, and the performance is diverse as running different graph algorithms. In this paper, considering the characteristics of a graph algorithm, we propose a distributed graph partitioning framework that can decide which partitioning strategy is better based on the program analysis. We also design a triangle-based partitioning strategy which benefits several graph algorithms such as the triangle counting. In addition, to reduce the number of shuffling operations in PageRank algorithm, we redesign the PageRank algorithm which considers the corresponding hash partitioning. The experimental results show that our adaptive graph partitioning framework implemented on Graphx surpasses the original one, especially for the triangle counting and PageRank, no matter on real-world graphs or synthetic graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data availability statement

All data included in this study are available upon request by contact with the corresponding author.

References

  1. Buluç A, Gilbert JR (2011) The combinatorial blas: design, implementation, and applications. Int J High Perform Comput Appl 25(4):496–509

    Article  Google Scholar 

  2. Cheng R, Hong J, Kyrola A, Miao Y, Weng X, Wu M, Yang F, Zhou L, Zhao F, Chen E (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98

  3. Stutz P, Bernstein A, Cohen W (2010) Signal/collect: graph algorithms for the (semantic) web. In: International Semantic Web Conference, pp 764–780. Springer

  4. Çatalyürek ÜV, Aykanat C (1996) Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. In: Ferreira A, Rolim J, Saad Y, Yang T (eds) Parallel algorithms for irregularly structured problems, pp 75–86. Springer, Berlin

  5. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp 135–146

  6. Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. arXiv preprint arXiv:1204.6078

  7. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: 10th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 12), pp 17–30

  8. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp 599–613

  9. Chen R, Shi J, Chen Y, Zang B, Guan H, Chen H (2019) Powerlyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans Parall Comput TOPC 5(3):1–39

    Article  Google Scholar 

  10. Dean J, Ghemawat S (2004) Mapreduce: simplified data processing on large clusters

  11. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th \(\{\)USENIX\(\}\) Symposium on Networked Systems Design and Implementation (\(\{\)NSDI\(\}\) 12), pp 15–28

  12. Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1999) The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab

    Google Scholar 

  13. Karypis George, Kumar Vipin (1999) Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev 41(2):278–300

    Article  MathSciNet  MATH  Google Scholar 

  14. Schloegel K, Karypis G, Kumar V (2000) Parallel multilevel algorithms for multi-constraint graph partitioning. In: Bode A, Ludwig T, Karl W, Wismüller R (eds) Euro-Par 2000 parallel processing, pp 296–310. Springer, Heidelberg

  15. Faloutsos M, Faloutsos P, Faloutsos C (2011) On power-law relationships of the internet topology. In: The Structure and Dynamics of Networks, pp 195–206. Princeton University Press, Princeton

  16. Newman MEJ (2005) Power laws, pareto distributions and Zipf’s law. Contemp Phys 46(5):323–351

    Article  Google Scholar 

  17. Feige U, Hajiaghayi MT, Lee JR (2008) Improved approximation algorithms for minimum weight vertex separators. SIAM J Comput 38(2):629–657

    Article  MathSciNet  MATH  Google Scholar 

  18. Zhang Y, Li D, Zhang C, Wang J, Liu L (2017) Grapha: efficient partitioning and storage for distributed graph computation. IEEE Trans Serv Comput 14(1):155–166

    Google Scholar 

  19. Karypis G, Kumar V (1998) Multilevelk-way partitioning scheme for irregular graphs. J Parall Distrib Comput 48(1):96–129

    Article  Google Scholar 

  20. Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1222–1230

  21. Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp 333–342

  22. Liu X, Zhou Y, Guan X, Shen C (2017) A feasible graph partition framework for parallel computing of big graph. Knowl Based Syst 134:228–239

    Article  Google Scholar 

  23. Zhu X, Chen W, Zheng W, Ma X (2016) Gemini: a computation-centric distributed graph processing system. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp 301–316, Savannah, GA, November. USENIX Association

  24. Xie C, Yan L, Li WJ, Zhang Z (2014) Distributed power-law graph computing: theoretical and empirical analysis. Nips 27:1673–1681

    Google Scholar 

  25. Jain N, Liao G, Willke TL (2013) Graphbuilder: scalable graph etl framework. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, New York, NY, USA. Association for Computing Machinery

  26. Roshan D, Gurbinder G, Loc H, Hoang-Vu D, Alex B, Nikoli D, Marc S, Keshav P (2018) Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 752–768

  27. Slota GM, Root C, Devine K, Madduri K, Rajamanickam S (2020) Scalable, multi-constraint, complex-objective graph partitioning. IEEE Trans Parall Distrib Syst 31(12):2789–2801

    Article  Google Scholar 

  28. Hoang L, Dathathri R, Gill G, Pingali K (2019) Cusp: a customizable streaming edge partitioner for distributed graph analytics. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 439–450. IEEE

  29. Gill G, Dathathri R, Hoang L, Pingali K (2018) A study of partitioning policies for graph analytics on large-scale distributed platforms. Proceedings of the VLDB Endowment 12(4):321–334

    Article  Google Scholar 

  30. Boman EG, Devine KD, Rajamanickam S (2013) Scalable matrix computations on large scale-free graphs using 2d graph partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp 1–12

  31. Leskovec J, Krevl A(2014) Snap datasets: Stanford large network dataset collection

  32. DIMACS (2006) The 9th dimacs implementation challenge—shortest paths

  33. Dharavath R, Singh AN (2019) Spark’s graphx-based link prediction for social communication using triangle counting. Social Netw Anal Min 9(1):1–12

  34. http://konect.cc

  35. Tang J, Zhang J, Yao L, Li J (2008) Extraction and mining of an academic social network. In: Proceedings of the 17th International Conference on World Wide Web, pp 1193–1194

  36. Ding Y, Yan S, Zhang Y, Dai W, Dong L (2016) Predicting the attributes of social network users using a graph-based machine learning method. Comput Commun 73:3–11

    Article  Google Scholar 

  37. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Systems 30(1–7):107–117

    Article  Google Scholar 

  38. Alazawi Z, Abdljabar MB, Altowaijri S, Vegni AM, Mehmood R (2012) Icdms: an intelligent cloud based disaster management system for vehicular networks. In: International Workshop on Communication Technologies for Vehicles, pp 40–56. Springer

  39. Tian Y, Mceachin RC, Santos C, States DJ, Patel JM (2007) Saga: a subgraph matching tool for biological graphs. Bioinformatics 23(2):232–239

    Article  Google Scholar 

  40. Somyung O, Ha J, Lee K, Sejong O (2017) Degoviz: an interactive visualization tool for a differentially expressed genes heatmap and gene ontology graph. App Sci 7(6):543

    Article  Google Scholar 

  41. Ying D (2011) Scientific collaboration and endorsement: network analysis of coauthorship and citation networks. J Inform 5(1):187–203

    Article  Google Scholar 

  42. Dave A, Jindal A, Li LE, Xin R, Gonzalez J, Zaharia M (2016) Graphframes: an integrated api for mixing graph and relational queries. In: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, pp 1–8

Download references

Acknowledgements

This work is supported by Science and Technology Research Project of Hebei Higher Education Institutions [No. QN2020133], and the Natural Science Foundation of Hebei Province of China [No. F2019201361].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhai, X., Zhang, H., Huang, X. et al. Graph partitioning strategies: one size does not fit all. J Supercomput 78, 19272–19295 (2022). https://doi.org/10.1007/s11227-022-04620-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04620-2

Keywords

Navigation