skip to main content
10.1145/2623330.2623660acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Balanced graph edge partition

Published: 24 August 2014 Publication History

Abstract

Balanced edge partition has emerged as a new approach to partition an input graph data for the purpose of scaling out parallel computations, which is of interest for several modern data analytics computation platforms, including platforms for iterative computations, machine learning problems, and graph databases. This new approach stands in a stark contrast to the traditional approach of balanced vertex partition, where for given number of partitions, the problem is to minimize the number of edges cut subject to balancing the vertex cardinality of partitions. In this paper, we first characterize the expected costs of vertex and edge partitions with and without aggregation of messages, for the commonly deployed policy of placing a vertex or an edge uniformly at random to one of the partitions. We then obtain the first approximation algorithms for the balanced edge-partition problem which for the case of no aggregation matches the best known approximation ratio for the balanced vertex-partition problem, and show that this remains to hold for the case with aggregation up to factor that is equal to the maximum in-degree of a vertex. We report results of an extensive empirical evaluation on a set of real-world graphs, which quantifies the benefits of edge- vs. vertex-partition, and demonstrates efficiency of natural greedy online assignments for the balanced edge-partition problem with and with no aggregation.

Supplementary Material

MP4 File (p1456-sidebyside.mp4)

References

[1]
A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioning power-law graphs. In Proc. of the 20th Int'l Conf. on Parallel and Distributed Processing, IEEE IPDPS'06, pages 124--124, Washington, DC, USA, 2006.
[2]
V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In Proc. of the 2010 ACM/IEEE Int'l Conf. for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, 2010.
[3]
C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a workload-driven approach to database replication and partitioning. In VLDB '10, 2010.
[4]
M. Curtiss, I. Becker, T. Bosman, S. Doroshenko, L. Grijncu, T. Jackson, S. Kunnatur, S. Lassen, P. Pronin, S. Sankar, G. Shen, G. Woss, C. Wang, and N. Zhang. Unicorn: A system for searching the social graph. In VLDB '13, 2013.
[5]
Q. Duong, S. Goel, J. Hofman, and S. Vassilvitskii. Sharding social networks. In ACM WSDM '13, pages 223--232, New York, NY, USA, 2013.
[6]
U. Feige and R. Krauthgamer. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput., 31(4):1090--1118, Apr. 2002.
[7]
M. V. Florian Bourse, Marc Lelarge. Balanced edge partition. Technical Report MSR-TR-2014--20, Microsoft Research, 2014.
[8]
T. A. S. F. Giraph. http://giraph.apache.org, 2014.
[9]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI'12, pages 17--30. USENIX Association, 2012.
[10]
D. Gregor and A. Lumsdaine. The parallel bgl: A generic library for distributed graph computations. In Proceedings of POOSC, 2005.
[11]
U. Kang, C. E. T., and C. Faloutsos. Pegasus: A peta-scale graph mining system. In ICDM, pages 229--238, 2009.
[12]
G. Karypis and V. Kumar. Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0. 1995.
[13]
G. Karypis and V. Kumar. Parallel multilevel graph partitioning. In Proc. of the 10th Int'l Parallel Processing Symposium, IEEE IPPS '96, pages 314--319, Washington, DC, USA, 1996.
[14]
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96 -- 129, 1998.
[15]
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J., 49(2):291--307, Feb. 1970.
[16]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proc. of the 8th ACM European Conference on Computer Systems, ACM EuroSys '13, pages 169--182, New York, NY, USA, 2013.
[17]
A. Konstantin and H. Racke. Balanced graph partitioning. In SPAA '04, pages 120--124, 2004.
[18]
R. Krauthgamer, J. S. Naor, and R. Schwartz. Partitioning graphs into balanced components. In SODA '09, pages 942--949, 2009.
[19]
K. Lang. Finding good nearly balanced cuts in power law graphs. Technical Report YRL-2004-036, Yahoo! Research Labs, 2004.
[20]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2008.
[21]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[22]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. In UAI, pages 340--349, 2010.
[23]
G. Malewicz, M. H. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In ACM SIGMOD '10, pages 135--146, 2010.
[24]
Neo4j. http://www.neo4j.org, 2014.
[25]
J. Nishimura and J. Ugander. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In ACM KDD '13, pages 1106--1114, 2013.
[26]
F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In High-Performance Computing and Networking, volume 1067 of Lecture Notes in Computer Science, pages 493--498. Springer Berlin Heidelberg, 1996.
[27]
V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haridasan. Managing large graphs on multi-cores with graph awareness. In USENIX ATC'12, pages 4--4, 2012.
[28]
J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, and P. Rodriguez. The little engine(s) that could: scaling online social networks. In ACM SIGCOMM '10, pages 375--386, New York, NY, USA, 2010.
[29]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In ACM SOSP'13, 2013.
[30]
B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In Proceedings of the VLDB Endowment, VLDB '13, 2013.
[31]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In ACM PPoPP '13, pages 135--146, New York, NY, USA, 2013.
[32]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In ACM KDD '12, pages 1222--1230, 2012.
[33]
Z. Svitkina and L. Fleischer. Submodular approximation: Sampling-based algorithms and lower bounds. SIAM J. Comput., 40(6):1715--1737, Dec. 2011.
[34]
C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In ACM WSDM '14, 2014.
[35]
J. Ugander and L. Backstrom. Balanced label propagation for partitioning massive graphs. In ACM WSDM '13, pages 507--516, 2013.
[36]
V. Venkataramani, Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, S. Kulkarni, N. Lawrence, M. Marchukov, D. Petrov, and L. Puzar. TAO: how facebook serves the social graph. In ACM SIGMOD '12, pages 791--792, 2012.
[37]
R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: a resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, ACM GRADES '13, pages 2:1--2:6, 2013.
[38]
S. Yang, X. Yan, B. Zong, and A. Khan. Towards effective partition management for large graphs. In ACM SIGMOD '12, pages 517--528, 2012.
[39]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI'12, pages 2--2, Berkeley, CA, USA, 2012. USENIX Association.
[40]
J. Zhou, N. Bruno, and W. Lin. Advanced partitioning techniques for massively distributed computation. In ACM SIGMOD '12, pages 13--24, 2012.

Cited By

View all
  • (2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
  • (2025)Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.350129236:2(133-149)Online publication date: Feb-2025
  • (2024)CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and AnalyticsProceedings of the VLDB Endowment10.14778/3696435.369643718:1(14-27)Online publication date: 1-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximation algorithms
  2. distributed massive computation
  3. graph edge partition
  4. streaming heuristics

Qualifiers

  • Research-article

Conference

KDD '14
Sponsor:

Acceptance Rates

KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)109
  • Downloads (Last 6 weeks)9
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Capsule: An Out-of-Core Training Mechanism for Colossal GNNsProceedings of the ACM on Management of Data10.1145/37096693:1(1-30)Online publication date: 11-Feb-2025
  • (2025)Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.350129236:2(133-149)Online publication date: Feb-2025
  • (2024)CUTTANA: Scalable Graph Partitioning for Faster Distributed Graph Databases and AnalyticsProceedings of the VLDB Endowment10.14778/3696435.369643718:1(14-27)Online publication date: 1-Sep-2024
  • (2024)FSM: A Fine-Grained Splitting and Merging Framework for Dual-Balanced Graph PartitionProceedings of the VLDB Endowment10.14778/3665844.366586417:9(2378-2391)Online publication date: 1-May-2024
  • (2024)GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core SystemsACM Transactions on Architecture and Code Optimization10.1145/366199821:3(1-25)Online publication date: 26-Apr-2024
  • (2024)Play like a Vertex: A Stackelberg Game Approach for Streaming Graph PartitioningProceedings of the ACM on Management of Data10.1145/36549652:3(1-27)Online publication date: 30-May-2024
  • (2024)Linking Entities across Relations and GraphsACM Transactions on Database Systems10.1145/363936349:1(1-50)Online publication date: 3-Jan-2024
  • (2024)X-Shard: Optimistic Cross-Shard Transaction Processing for Sharding-Based BlockchainsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.336118035:4(548-559)Online publication date: Apr-2024
  • (2024)SaaN 2L-GRL: Two-Level Graph Representation Learning Empowered With Subgraph-as-a-NodeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.342193336:12(9205-9219)Online publication date: Dec-2024
  • (2024)LocalTGEP: A Lightweight Edge Partitioner for Time-Varying GraphIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.323833312:2(455-466)Online publication date: Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media