research-article

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

Authors:

Michał Podstawski,

Edgar Solomonik,

Torsten HoeflerAuthors Info & Claims

HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pages 93 - 104

https://doi.org/10.1145/3078597.3078616

Published: 26 June 2017 Publication History

Abstract

We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state. We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters. We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.

References

[1]

B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, 36(10):1258--1263, 1987.

Digital Library

[2]

D. A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. In Par. and Dist. Proc. Symp. (IPDPS), page 39. IEEE, 2004.

[3]

D. A. Bader et al. Approximating betweenness centrality. In Algorithms and Models for the Web-Graph, pages 124--137. Springer, 2007.

Digital Library

[4]

S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3--4):137--148, 2013.

Digital Library

[5]

S. Beamer, K. Asanović, and D. Patterson. GAIL: the graph algorithm iron law. In Workshop on Ir. App.: Arch. and Alg., page 13, 2015.

Digital Library

[6]

E. G. Boman et al. A scalable parallel graph coloring algorithm for distributed memory computers. In Euro-Par, pages 241--251. 2005.

Digital Library

[7]

M. Borokhovich et al. Tight bounds for algebraic gossip on graphs. In Inf. Theory Proc. (ISIT), IEEE Intl. Symp. on, pages 1758--1762, 2010.

[8]

O. Boruvka. O jistém problému minimálnım. 1926.

[9]

U. Brandes. A faster algorithm for betweenness centrality. J. of Math. Sociology, 25(2):163--177, 2001.

[10]

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proc. of Intl. Conf. on World Wide Web, WWW7, pages 107--117, 1998.

Digital Library

[11]

U. Catalyurek and C. Aykanat. A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices. In Proc. of the Intl. Par. & Dist. Proc. Symp., IPDPS '01, pages 118--, 2001.

Digital Library

[12]

V. T. Chakaravarthy et al. Scalable single source shortest path algorithms for massively parallel systems. In Par. and Dist. Proc. Symp., IEEE Intl., pages 889--901, 2014.

Digital Library

[13]

T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.

Digital Library

[14]

G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5):1--9, 2006.

[15]

N. Doekemeijer and A. L. Varbanescu. A survey of parallel graph processing frameworks. Delft University of Technology, 2014.

[16]

P. Erdos and A. Rényi. On the evolution of random graphs. Selected Papers of Alfréd Rényi, 2:482--525, 1976.

[17]

S. Fortune and J. Wyllie. Parallelism in random access machines. In Proc. of ACM Symp. on Theory of Comp., pages 114--118, 1978.

Digital Library

[18]

H. Gazit et al. An improved parallel algorithm that computes the BFS numbering of a directed graph. Inf. Proc. Let., 28(2):61--65, 1988.

Digital Library

[19]

H. Gazit et al. Optimal tree contraction in the EREW model. In Concurrent Computations, pages 139--156. Springer, 1988.

[20]

R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of the ACM/IEEE Supercomputing, SC '13, pages 53:1--53:12, 2013.

Digital Library

[21]

A. Goel and K. Munagala. Complexity measures for map-reduce, and comparison to parallel computing. arXiv preprint arXiv:1211.6526, 2012.

[22]

J. E. Gonzalez et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, volume 12, page 2, 2012.

Digital Library

[23]

O. Green, M. Dukhan, and R. Vuduc. Branch-Avoiding Graph Algorithms. arXiv preprint arXiv:1411.1460, 2014.

[24]

D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. Par. Obj.-Or. Scientific Comp. (POOSC), page 2, 2005.

[25]

T. J. Harris. A survey of PRAM simulation techniques. ACM Comp. Surv. (CSUR), 26(2):187--206, 1994.

Digital Library

[26]

Intel, Inc. 64 and IA-32 Architectures Software Developer's Manual, 2015.

[27]

J. Kepner and J. Gilbert. Graph algorithms in the language of linear algebra, volume 22. SIAM, 2011.

Digital Library

[28]

J. Kim et al. Technology-Driven, Highly-Scalable Dragonfly Topology. In Ann. Intl. Symp. on Comp. Arch., ISCA '08, pages 77--88, 2008.

Digital Library

[29]

M. Kulkarni et al. Optimistic parallelism requires abstractions. In ACM SIGPLAN Conf. on Prog. Lang. Des. and Impl., PLDI '07, pages 211--222, 2007.

Digital Library

[30]

C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proc. of ACM Symp. on Par. in Alg. and Arch., pages 303--314, 2010.

Digital Library

[31]

J. Leskovec et al. Kronecker graphs: An approach to modeling networks. J. of Machine Learning Research, 11(Feb):985--1042, 2010.

Digital Library

[32]

Y. Low et al. Graphlab: A new framework for parallel machine learning. preprint arXiv:1006.4990, 2010.

[33]

A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in Parallel Graph Processing. Par. Proc. Let., 17(1):5--20, 2007.

[34]

K. Madduri et al. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In Par. & Dist. Proc. (IPDPS), IEEE Intl. Symp. on, pages 1--8, 2009.

Digital Library

[35]

G. Malewicz et al. Pregel: a system for large-scale graph processing. In Proc. of the ACM SIGMOD Intl. Conf. on Manag. of Data, SIGMOD '10, pages 135--146, 2010.

Digital Library

[36]

T. Mattson et al. Standards for graph algorithm primitives. arXiv preprint arXiv:1408.0393, 2014.

[37]

U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, 2003.

Digital Library

[38]

Michael Voss (Intel). Understanding the Internals of tbb::graph : Balancing Push and Pull.

[39]

MPI Forum.textsfMPI: A Message-Passing Interface Standard. Version 3, 2012.

[40]

R. C. Murphy et al. Introducing the graph 500. Cray User's Group (CUG), 2010.

[41]

V. Prabhakaran et al. Managing large graphs on multi-cores with graph awareness. In USENIX Annual Technical Conference, volume 12, 2012.

Digital Library

[42]

D. Prountzos and K. Pingali. Betweenness centrality: algorithms and implementations. In ACM SIGPLAN Notices, volume 48, pages 35--46. ACM, 2013.

Digital Library

[43]

S. Salihoglu and J. Widom. Optimizing graph algorithms on Pregel-like systems. Proceedings of the VLDB Endowment, 7(7):577--588, 2014.

Digital Library

[44]

N. Satish et al. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD Intl. Conf. on Man. of Data, pages 979--990, 2014.

Digital Library

[45]

T. Schank. Algorithmic aspects of triangle-based network analysis. PhD thesis, University Karlsruhe, 2007.

[46]

S. Seo et al. HAMA: An Efficient Matrix Computation with the MapReduce Framework. In Intl. Conf. on Cloud Comp. Tech. and Science, CLOUDCOM'10, pages 721--726, 2010.

Digital Library

[47]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In ACM SIGPLAN Notices, volume 48, pages 135--146, 2013.

Digital Library

[48]

J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In 2015 IEEE 31st Intl. Conf. on Data Engineering, pages 149--160, April 2015.

[49]

T. Suzumura et al. Performance characteristics of Graph500 on large-scale distributed environment. In Workload Char. (IISWC), IEEE Intl. Symp. on, pages 149--158, 2011.

Digital Library

[50]

V. N. Swamy et al. An Asymptotically Optimal Push--Pull Method for Multicasting Over a Random Network. Inf. Theory, IEEE Tran. on, 59(8):5075--5087, 2013.

Digital Library

[51]

Z. Wang et al. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In ACM Intl. Conf. on Man. of Data, pages 479--494, 2016.

Digital Library

[52]

J. J. Whang et al. Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned. In Euro-Par: Par. Proc., pages 438--450. 2015.

[53]

J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, 2015.

Digital Library

[54]

M. Zaharia et al. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proc. of the USENIX Conf. on Net. Sys. Design and Impl., NSDI'12, pages 2--2, 2012.

Digital Library

[55]

M. Zhang et al. Exploring the hidden dimension in graph processing. In USENIX Symp. on Op. Sys. Des. and Impl. (OSDI 16), 2016.

Digital Library

[56]

Y. Zhao. A model of computation with push and pull processing. PhD thesis, Citeseer, 2003.

[57]

X. Zhu et al. Gemini: A computation-centric distributed graph processing system. In USENIX Symp. on Op. Sys. Des. and Impl. (OSDI 16), 2016.

Digital Library

Cited By

Blach NBesta MDe Sensi DDomke JHarake HLi SIff PKonieczny MLakhotia KKubicek AFerrari MPetrini FHoefler TVanbever LZhang I(2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691882
Chen ZZhang FChen YFang XFeng GZhu XChen WDu X(2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681979
Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Show More Cited By

Index Terms

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

Recommendations

The Push/Pull model of transactions
PLDI '15

We present a general theory of serializability, unifying a wide range of transactional algorithms, including some that are yet to come. To this end, we provide a compact semantics in which concurrent transactions PUSH their effects into the shared view ...
Playing push vs pull: models and algorithms for disseminating dynamic data in networks
SPAA '06: Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures

Consider a network in which a collection of source nodes maintain and periodically update data objects for a collection of sink nodes, each of which periodically accesses the data originating from some specified subset of the source nodes. We consider ...
Push and Pull Production Systems: Issues and Comparisons

<P>Concerns about American manufacturing competitiveness compel new interest in alternative production control strategies. In this paper, we examine the behavior of push and pull production systems in an attempt to explain the apparent superior ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

June 2017

254 pages

ISBN:9781450346993

DOI:10.1145/3078597

General Chairs:
Howie Huang
George Washington University, USA
,
Jon Weissman
University of Minnesota, USA
,
Program Chairs:
Adriana Iamnitchi
University of South Florida, USA
,
Alexandru Iosup
Vrije Universiteit Amsterdam and Delft University of Technology, NLD

Copyright © 2017 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

graph computations

Qualifiers

Research-article

Conference

HPDC '17

Sponsor:

University of Arizona
SIGARCH

HPDC '17: The 26th International Symposium on High-Performance Parallel and Distributed Computing

June 26 - 30, 2017

DC, Washington, USA

Acceptance Rates

HPDC '17 Paper Acceptance Rate 19 of 100 submissions, 19%;

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

111
Total Citations
View Citations
1,093
Total Downloads

Downloads (Last 12 months)79
Downloads (Last 6 weeks)13

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Blach NBesta MDe Sensi DDomke JHarake HLi SIff PKonieczny MLakhotia KKubicek AFerrari MPetrini FHoefler TVanbever LZhang I(2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691882
Chen ZZhang FChen YFang XFeng GZhu XChen WDu X(2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681979
Akbudak K(2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1177/10943420231214532
Alsop JAga SIbrahim MIslam MJayasena NMcCrabb A(2024)PIM-Potential: Broadening the Acceleration Reach of PIM ArchitecturesProceedings of the International Symposium on Memory Systems10.1145/3695794.3695795(1-12)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3695794.3695795
Liu YAzami NVanausdal ABurtscher M(2024)Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common BugsACM Transactions on Parallel Computing10.1145/366525111:3(1-29)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3665251
Fu QJi YRolinger THuang H(2024)TLPGNN: A Lightweight Two-level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUsACM Transactions on Parallel Computing10.1145/364471211:2(1-28)Online publication date: 8-Jun-2024
https://dl.acm.org/doi/10.1145/3644712
Okanovic PKwasniewski GLabini PBesta MVella FHoefler T(2024)High Performance Unstructured SpMM Computation Using Tensor CoresSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00060(1-14)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00060
Zhang YWang MWang WMai YHuang HYu Z(2024)Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00056(671-685)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00056
Lindqvist BPodobas A(2024)Algorithms for Fast Spiking Neural Network Simulation on FPGAsIEEE Access10.1109/ACCESS.2024.347993312(150334-150353)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3479933
Behera NKumar ARajadurai T ENitish SM RNasre R(2024)StarPlatJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967194:COnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2024.104967
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten