Skip to main content
Log in

PPR-partitioning: a distributed graph partitioning algorithm based on the personalized PageRank vectors in vertex-centric systems

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Relations among data items can be modeled with graphs in most of big data sets such as social networks’ data. This modeling creates big graphs with many vertices and edges. Balanced k-way graph partitioning is a common problem with big graphs. It has many applications in several fields. There are many approximate solutions for this problem; however, most of them do not have enough scalability for big graph partitioning and cannot be executed in a distributed manner. Vertex-centric model has been introduced recently as a scalable distributed processing method for big graphs. There are a few methods for graph partitioning based on this model. Existing approaches only consider one-step neighbors of vertices for graph partitioning and do not consider neighbors with higher steps. In this paper, a distributed method is introduced based on vertex-centric model for balanced k-way graph partitioning. This method applies the personalized PageRank vectors of vertices and partitions to decide how vertices are joined partitions. This method has been implemented in the Giraph system. The proposed method has been evaluated with several synthetic and real graphs. Experimental results have shown that this method has scalability for partitioning big graphs. It was also found that this method produces partitions with higher quality compared to the state-of-the-art stream-based methods and distributed methods based on vertex-centric programming model. Its result is close to the results of Metis method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97

    Article  MathSciNet  MATH  Google Scholar 

  2. Andersen R, Chung F, Lang K (2006) Local graph partitioning using PageRank vectors. In: 47th annual IEEE symposium on foundations of computer science (FOCS’06), pp 475–486

  3. Andersen R, Chung F, Lang K (2008) Local partitioning for directed graphs using pagerank. Internet Math. 5(1–2):3–22

    Article  MathSciNet  MATH  Google Scholar 

  4. Avery C (2011) Giraph: large-scale graph processing infrastructure on hadoop. Proc Hadoop Summit Santa Clara 11(3):5–9

    Google Scholar 

  5. Avrachenkov K, Litvak N, Nemirovsky D, Osipova N (2007) Monte carlo methods in pagerank computation: when one iteration is sufficient. SIAM J Numer Anal 45(2):890–904

    Article  MathSciNet  MATH  Google Scholar 

  6. Aydin K, Bateni M, Mirrokni V (2016) Distributed balanced partitioning via linear embedding. In: Proceedings of the 9th international conference on web search and data mining, WSDM’16. ACM, pp 387–396

  7. Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184

    Article  Google Scholar 

  8. Buluç A, Meyerhenke H, Safro I, Sanders P, Schulz C (2016) Recent advances in graph partitioning. In: Kliemann L, Sanders P (eds) Algorithm engineering: selected results and surveys, vol 9220. Springer, Cham, pp 117–158. https://doi.org/10.1007/978-3-319-49487-6_4

  9. Chen R, Shi J, Chen Y, Chen H (2015) PowerLyra: differentiated graph computation and partitioning on skewed graphs. In: Proceedings of the 10th European conference on computer systems, EuroSys ’15. ACM, pp 1:1–1:15

  10. Chung F, Simpson O (2018) Computing heat kernel pagerank and a local clustering algorithm. Eur J Comb 68(Supplement C):96–119

    Article  MathSciNet  MATH  Google Scholar 

  11. Condon A, Karp RM (2001) Algorithms for graph partitioning on the planted partition model. Random Struct Algorithms 18(2):116–140

    Article  MathSciNet  MATH  Google Scholar 

  12. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  13. Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957

    Article  Google Scholar 

  14. Fogaras D, Rcz B, Csalogny K, Sarls T (2005) Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math 2(3):333–358

    Article  MathSciNet  Google Scholar 

  15. Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of 10th USENIX symposium on operating systems design and implementation (OSDI), vol 12, pp 17–30

  16. Grady L, Schwartz EL (2006) Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3):469–475

    Article  Google Scholar 

  17. Guerrieri Alessio MA (2015) DFEP: distributed funding-based edge partitioning. In: Euro-Par: 21st international conference on parallel and distributed computing. Springer, Berlin, pp 346–358

  18. Guo T, Cao X, Cong G, Lu J, Lin X (2017) Distributed algorithms on exact personalized PageRank. In: Proceedings of the international conference on management of data, SIGMOD ’17. ACM, pp 479–494

  19. Jeh G, Widom J (2003) Scaling personalized web search. In: Proceedings of the 12th international conference on world wide web, WWW ’03. ACM, pp 271–279

  20. Karypis G, Aggarwal R, Kumar V, Shekhar S (1999) Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Trans Very Large Scale Integr VLSI Syst 7(1):69–79

    Article  Google Scholar 

  21. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  MATH  Google Scholar 

  22. Kunegis J (2013) KONECT: the Koblenz network collection. In: Proceedings of the 22th international conference on world wide web, WWW ’13 companion. ACM, pp 1343–1350

  23. Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

  24. Lofgren PA, Banerjee S, Goel A, Seshadhri C (2014) FAST-PPR: scaling personalized PageRank estimation for large graphs. In: Proceedings of the 20th SIGKDD international conference on knowledge discovery and data mining, KDD ’14. ACM, pp 1436–1445

  25. Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727

    Article  Google Scholar 

  26. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD international conference on management of data, SIGMOD ’10. ACM, pp 135–146

  27. Martella C, Logothetis D, Loukas A, Siganos G (2017) Spinner: scalable graph partitioning in the cloud. In: IEEE 33th international conference on data engineering (ICDE), pp 1083–1094

  28. McSherry F (2001) Spectral partitioning of random graphs. In: Proceedings IEEE international conference on cluster computing, pp 529–537

  29. Meyerhenke H, Sanders P, Schulz C (2017) Parallel graph partitioning for complex networks. IEEE Trans Parallel Distrib Syst 28(9):2625–2638

    Article  Google Scholar 

  30. Mofrad MH, Melhem R, Hammoud M (2018) Revolver: vertex-centric graph partitioning using reinforcement learning. In: 2018 IEEE 11th international conference on cloud computing (CLOUD), vol 00, pp 818–821. https://doi.org/10.1109/CLOUD.2018.00111

  31. Nishimura J, Ugander J (2013) Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In: Proceedings of the 19th SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, pp 1106–1114. https://doi.org/10.1145/2487575.2487696

  32. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab

  33. Perozzi B, McCubbin C, Halbert JT (2014) Scalable graph clustering with parallel approximate PageRank. Soc Netw Anal Min 4(1):179

    Article  Google Scholar 

  34. Rahimian F, Payberah AH, Girdzijauskas S, Haridi S (2014) Distributed vertex-cut partitioning. In: IFIP international conference on distributed applications and interoperable systems. Springer, pp 186–200

  35. Rahimian F, Payberah AH, Girdzijauskas S, Jelasity M, Haridi S (2013) JA-BE-JA: a distributed algorithm for balanced graph partitioning. In: IEEE 7th international conference on self-adaptive and self-organizing systems, pp 51–60

  36. Sajjad HP, Payberah AH, Rahimian F, Vlassov V, Haridi S (2016) Boosting vertex-cut partitioning for streaming graphs. In: IEEE international congress on big data (BigData congress), pp 1–8

  37. Sala, A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated graph models for social network experiments. In: Proceedings of the 19th international conference on world wide web, WWW ’10. ACM, pp 861–870

  38. Spielman DA, Teng S-H (2004) Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the 36th symposium on theory of computing, STOC ’04. ACM, pp 81–90

  39. Spielman DA, Teng S-H (2013) A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J Comput 42(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  40. Stanton I (2014) Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of the 25th symposium on discrete algorithms. SIAM, pp 1287–1301

  41. Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, pp 1222–1230

  42. Tabrizi SA, Shakery A, Asadpour M, Abbasi M, Tavallaie MA (2013) Personalized pagerank clustering: a graph clustering algorithm based on random walks. Phys A Stat Mech Appl 392(22):5772–5785

    Article  MathSciNet  MATH  Google Scholar 

  43. Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th international conference on web search and data mining, WSDM ’14. ACM, pp 333–342

  44. Ugander J, Backstrom L (2013) Balanced label propagation for partitioning massive graphs. In: Proceedings of the 6th international conference on web search and data mining, WSDM ’13. ACM, pp 507–516

  45. Wang L, Xiao Y, Shao B, Wang H (2014) How to partition a billion-node graph. In: IEEE 30th international conference on data engineering, pp 568–579

  46. Whang JJ, Gleich DF, Dhillon IS (2016) Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans Knowl Data Eng 28(5):1272–1284

    Article  Google Scholar 

  47. Xie C, Li W-J, Zhang Z (2015) S-PowerGraph: streaming graph partitioning for natural graphs by vertex-cut. CoRR arXiv:1511.02586

  48. Zhang H, Raitoharju J, Kiranyaz S, Gabbouj M (2016) Limited random walk algorithm for big graph data clustering. J Big Data 3(1):26

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afsaneh Fatemi.

Ethics declarations

Funding

No funding was received by the authors.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mazaheri Soudani, N., Fatemi, A. & Nematbakhsh, M. PPR-partitioning: a distributed graph partitioning algorithm based on the personalized PageRank vectors in vertex-centric systems. Knowl Inf Syst 61, 847–871 (2019). https://doi.org/10.1007/s10115-019-01328-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01328-3

Keywords

Navigation