Abstract
Graph partitioning is crucial to parallel computations on large graphs. The choice of partitioning strategies has strong impact on the performance of graph algorithms. For an algorithm of our interest, what partitioning strategy fits it the best and improves its parallel execution? Is it possible to provide a uniform partition to a batch of algorithms that run on the same graph simultaneously, and speed up each and every of them? This paper aims to answer these questions. We propose an application-driven hybrid partitioning strategy that, given a graph algorithm \({{\mathcal {A}}}\), learns a cost model for \({{\mathcal {A}}}\) as polynomial regression. We develop partitioners that, given the learned cost model, refine an edge-cut or vertex-cut partition to a hybrid partition and reduce the parallel cost of \({{\mathcal {A}}}\). Moreover, we extend the cost-driven strategy to support multiple algorithms at the same time and reduce the parallel cost of each of them. Using real-life and synthetic graphs, we experimentally verify that our partitioning strategy improves the performance of a variety of graph algorithms, up to \(22.5\times \).










Similar content being viewed by others
Notes
We do not include the result of \(\mathsf {CN}\) since there exists no official implementation for \(\mathsf {CN}\) with Gunrock.
References
Gunrock. https://github.com/gunrock/gunrock/tree/master/ examples (2020)
Livejournal. http://snap.stanford.edu/data/soc-LiveJournal1.html (2009)
Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml (2010)
Twitter. http://twitter.com/ (2012)
UKWeb. http://law.di.unimi.it/webdata/uk union-2006-06-2007-05 (2006)
Graphscope. https://graphscope.io/ (2020)
Andreev, K., Racke, H.: Balanced graph partitioning. TCS 39(6): 929–939 (2006)
Avdiukhin, D., Pupyrev, S., Yaroslavtsev, G.: Multi-dimensional balanced graph partitioning via projected gradient descent. PVLDB 12(8), 906–919 (2019)
Bang-Jensen, J., Gutin, G.Z.: Digraphs: Theory, Algorithms and Applications. Springer (2008)
Bichot, C.E., Siarry, P.: Graph Partitioning. Wiley (2013)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition. In: SIGKDD, pp. 1456–1465 (2014)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)
Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in graph partitioning. In: Algorithm Engineering—Selected Results and Surveys, pp. 117–158 (2016)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chen, R., Shi, J., Chen, Y., Chen, H.: PowerLyra: differentiated graph computation and partitioning on skewed graphs. In: EuroSys, pp. 1:1–1:15 (2015)
Chvatal, V.: A greedy heuristic for the set-covering problem. Math. Oper. Res. 4(3), 233–235 (1979)
Cukierski, W., Hamner, B., Yang, B.: Graph-based features for supervised link prediction. In: INCC, pp. 1237–1244. IEEE (2011)
Dai, D., Zhang, W., Chen, Y.: IOGP: An incremental online graph partitioning algorithm for distributed graph databases. In: HPDC, pp. 219–230 (2017)
Fan, W., Jin, R., Liu, M., Lu, P., Luo, X., Xu, R., Yin, Q., Yu, W., Zhou, J.: Application driven graph partitioning. In: SIGMOD, pp. 1765–1779. ACM (2020)
Fan, W., Liu, M., Lu, P., Yin, Q.: Graph algorithms with partition transparency. IEEE Trans Knowl data Eng pp. 1–1 (2021). https://doi.org/10.1109/TKDE.2021.3097998
Fan, W., Yu, W., Xu, J., Zhou, J., Luo, X., Yin, Q., Lu, P., Cao, Y., Xu, R.: Parallelizing sequential graph computations. TODS 43(4), 18:1-18:39 (2018)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company (1979)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: OSDI, pp. 17–30 (2012)
Huang, J., Abadi, D.: LEOPARD: lightweight edge-oriented partitioning and replication for dynamic graphs. proc. VLDB endow. 9(7): 540–551(2016)
Huang, L., Jia, J., Yu, B., gon Chun, B., Maniatis, P., Naik, M.: Predicting execution time of computer programs using sparse polynomial regression. In: NIPS (2010)
Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978)
Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. Graph Data Manag. Exp. Syst. pp. 1–6 (2013). https://doi.org/ 10.1145/2484425.2484429
Karypis, G.: Metis and parmetis. In: Encyclopedia of Parallel Computing, pp. 1117–1124 (2011)
Karypis, G., Kumar, V.: Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. pp. 1–16 (1995)
Karypis, G., Kumar, V.: METIS a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4. pp. 1–44 (1998)
Karypis, G., Kumar, V.: Multilevelk-way partitioning scheme for irregular graphs. JPDC 48(1), 96–129 (1998)
Kim, M., Candan, K.S.: SBV-Cut: vertex-cut based graph partitioning using structural balance vertices. DKE 72, 285–303 (2012)
Krauthgamer, R., Naor, J., Schwartz, R.: Partitioning graphs into balanced components. In: SODA (2009)
Li, D., Zhang, Y., Wang, J., Tan, K.: TopoX: topology refactorization for efficient graph partitioning and processing. PVLDB 12(8), 891–905 (2019)
Liben-Nowell, D., Kleinberg, J.: The link prediction problem for social networks. In: CIKM (2003)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)
Margo, D.W., Seltzer, M.I.: A scalable distributed graph partitioner. PVLDB 8(12), 1478–1489 (2015)
Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: SIGMOD, pp. 145–156 (2012)
Newman, M.E., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc. Natl. Acad. Sci. 99(1), 2566–2572 (2002)
Park, H., Stefanski, L.: Relative-error prediction. Stat. Probab. Lett. 40(3), 227–236 (1998)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)
Petroni, F., Querzoni, L., Daudjee, K., Kamali, S., Iacoboni, G.: HDRF: stream-based partitioning for power-law graphs. In: CIKM (2015)
Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIMAX 11(3), 430–452 (1990)
Raz, R., Safra, S.: A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of np. In: STOC, pp. 475–484 (1997)
Slota, G.M., Rajamanickam, S., Madduri, K.: Pulp/xtrapulp: partitioning tools for extreme-scale graphs. Tech. Rep., Sandia National Lab.(SNL-NM), Albuquerque, NM (United States) (2017)
Tsourakakis, C.E., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: streaming graph partitioning for massive scale graphs. In: WSDM, pp. 333–342 (2014)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN symposium on principles and practice of parallel programming, pp. 1–12 (2016)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440 (1998)
Wikipedia: Stone-Weierstrass Theorem. https://en.wikipedia.org/wiki/Stone-Weierstrass_theorem
Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for large graphs. In: SIGMOD, p. 517 (2012)
Zhang, C., Wei, F., Liu, Q., Tang, Z.G., Li, Z.: Graph edge partitioning via neighborhood heuristic. In: KDD (2017)
Zhu, X., Chen, W., Zheng, W., Ma, X.: Gemini: a computation-centric distributed graph processing system. In: OSDI, pp. 301–316 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: More experimental study
Appendix: More experimental study
1.1 Impact of different phases
We tested the phases of \({\mathsf {ParE2H}}\) and \({\mathsf {ParV2H}}\) for their effectiveness. Denote by \({\mathsf {ParE2H}}_{k}\) (resp. \({\mathsf {ParV2H}}_{k}\)) (\(1\le k \le 3\)) the partitioner with the first k phases of \({\mathsf {ParE2H}}\) (resp. \({\mathsf {ParV2H}}\)). We assessed the speedup gain of the kth phase of \({\mathsf {ParE2H}}\) by comparing \({\mathsf {ParE2H}}_{k-1}\) and \({\mathsf {ParE2H}}_{k}\); similarly for \({\mathsf {ParV2H}}\). Figure 11a, b reports the normalized speedup ratio over \(\mathsf {Twitter}\) with \(n=96\) for \(\mathsf {HxtraPuLP}\) and \(\mathsf {HGrid}\), respectively. The results over \(\mathsf {liveJournal}\) and \(\mathsf {UKWeb}\) and other hybrid partitioners are consistent (not shown). We find the following.
-
(1)
\({\mathsf {ParE2H}}\). (a) Phase \({\mathsf {EMigrate}}\) accounts for 67.5%, 26.3%, 83.5%, 74.4% and \(89.2\%\) of the total speedup of \(\mathsf {CN}\), \(\mathsf {TC}\), \(\mathsf {WCC}\), \(\mathsf {PR}\) and \(\mathsf {SSSP}\), respectively. (b) \({\mathsf {ESplit}}\) alone improves \(\mathsf {CN}\) and \(\mathsf {TC}\) by 1.1 and 2.7 times, respectively. For \(\mathsf {WCC}\), \(\mathsf {PR}\) and \(\mathsf {SSSP}\), its impact is smaller, since \(\mathsf {CN}\) and \(\mathsf {TC}\) are more sensitive to workload imbalance. The impact of \({\mathsf {ESplit}}\) on \(\mathsf {CN}\) over \(\mathsf {Twitter}\) is smaller, since we filtered large-degree vertices for \(\mathsf {CN}\). Without filtering, \({\mathsf {ESplit}}\) improves \(\mathsf {CN}\) over \(\mathsf {liveJournal}\) by 1.9 times. (c) \({\mathsf {MAssign}}\) accounts for another 22.3, 30.1, 13.8, 21.9 and \(6.3\%\) of the speedup of \(\mathsf {CN}\), \(\mathsf {TC}\), \(\mathsf {WCC}\), \(\mathsf {PR}\) and \(\mathsf {SSSP}\), respectively.
-
(2)
\({\mathsf {ParV2H}}\). (a) Phase \({\mathsf {VMigrate}}\) contributes the most to the speedup of \(\mathsf {CN}\), \(\mathsf {TC}\), \(\mathsf {WCC}\), \(\mathsf {PR}\) and \(\mathsf {SSSP}\), which account for about 71.2, 81.2, 87.1, 78.2 and \(96.7\%\) of the total speedup, respectively. (b) By merging v-cut nodes into e-cut nodes, \({\mathsf {VMerge}}\) contributes 16.5, 5.8, 2.6, 7.1 and 1.2% of the total speedup for the five algorithms tested, respectively. (c) Phase \({\mathsf {MAssign}}\) contributes 9.9% on average.
Rights and permissions
About this article
Cite this article
Fan, W., Xu, R., Yin, Q. et al. Application-driven graph partitioning. The VLDB Journal 32, 149–172 (2023). https://doi.org/10.1007/s00778-022-00736-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-022-00736-2