Abstract:
Streaming edge partitioning plays a crucial role in the distributed processing of large-scale web graphs, such as pagerank. The quality of partitioning is of utmost impor...Show MoreMetadata
Abstract:
Streaming edge partitioning plays a crucial role in the distributed processing of large-scale web graphs, such as pagerank. The quality of partitioning is of utmost importance and directly affects the runtime cost of distributed graph processing. However, streaming graph clustering, a key component of mainstream streaming edge partitioning, is vertex-centric. This incurs a mismatch with the edge-centric partitioning strategy, necessitating additional post-processing and several graph traversals to transition from vertex-centric clusters to edge-centric partitions. This transition not only adds extra runtime overhead but also risks a decline in partitioning quality. In this paper, we propose a novel algorithm, called ClusPar, to address the problem of streaming edge partitioning. The ClusPar framework consists of two steps, streaming edge clustering and edge cluster partitioning. Different from prior studies, the first step traverses the input graph in a single pass to generate edge-centric clusters, while the second step applies game theory over these edge-centric clusters to produce partitions. Extensive experiments show that ClusPar outperforms the state-of-the-art streaming edge partitioning methods in terms of the partitioning quality, efficiency, and scalability.
Published in: IEEE Transactions on Computers ( Volume: 74, Issue: 1, January 2025)