skip to main content
10.1145/3078597.3078616acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

Published: 26 June 2017 Publication History

Abstract

We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state. We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters. We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.

References

[1]
B. Awerbuch and Y. Shiloach. New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Transactions on Computers, 36(10):1258--1263, 1987.
[2]
D. A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. In Par. and Dist. Proc. Symp. (IPDPS), page 39. IEEE, 2004.
[3]
D. A. Bader et al. Approximating betweenness centrality. In Algorithms and Models for the Web-Graph, pages 124--137. Springer, 2007.
[4]
S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. Scientific Programming, 21(3--4):137--148, 2013.
[5]
S. Beamer, K. Asanović, and D. Patterson. GAIL: the graph algorithm iron law. In Workshop on Ir. App.: Arch. and Alg., page 13, 2015.
[6]
E. G. Boman et al. A scalable parallel graph coloring algorithm for distributed memory computers. In Euro-Par, pages 241--251. 2005.
[7]
M. Borokhovich et al. Tight bounds for algebraic gossip on graphs. In Inf. Theory Proc. (ISIT), IEEE Intl. Symp. on, pages 1758--1762, 2010.
[8]
O. Boruvka. O jistém problému minimálnım. 1926.
[9]
U. Brandes. A faster algorithm for betweenness centrality. J. of Math. Sociology, 25(2):163--177, 2001.
[10]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proc. of Intl. Conf. on World Wide Web, WWW7, pages 107--117, 1998.
[11]
U. Catalyurek and C. Aykanat. A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices. In Proc. of the Intl. Par. & Dist. Proc. Symp., IPDPS '01, pages 118--, 2001.
[12]
V. T. Chakaravarthy et al. Scalable single source shortest path algorithms for massively parallel systems. In Par. and Dist. Proc. Symp., IEEE Intl., pages 889--901, 2014.
[13]
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
[14]
G. Csardi and T. Nepusz. The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5):1--9, 2006.
[15]
N. Doekemeijer and A. L. Varbanescu. A survey of parallel graph processing frameworks. Delft University of Technology, 2014.
[16]
P. Erdos and A. Rényi. On the evolution of random graphs. Selected Papers of Alfréd Rényi, 2:482--525, 1976.
[17]
S. Fortune and J. Wyllie. Parallelism in random access machines. In Proc. of ACM Symp. on Theory of Comp., pages 114--118, 1978.
[18]
H. Gazit et al. An improved parallel algorithm that computes the BFS numbering of a directed graph. Inf. Proc. Let., 28(2):61--65, 1988.
[19]
H. Gazit et al. Optimal tree contraction in the EREW model. In Concurrent Computations, pages 139--156. Springer, 1988.
[20]
R. Gerstenberger, M. Besta, and T. Hoefler. Enabling Highly-scalable Remote Memory Access Programming with MPI-3 One Sided. In Proc. of the ACM/IEEE Supercomputing, SC '13, pages 53:1--53:12, 2013.
[21]
A. Goel and K. Munagala. Complexity measures for map-reduce, and comparison to parallel computing. arXiv preprint arXiv:1211.6526, 2012.
[22]
J. E. Gonzalez et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, volume 12, page 2, 2012.
[23]
O. Green, M. Dukhan, and R. Vuduc. Branch-Avoiding Graph Algorithms. arXiv preprint arXiv:1411.1460, 2014.
[24]
D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. Par. Obj.-Or. Scientific Comp. (POOSC), page 2, 2005.
[25]
T. J. Harris. A survey of PRAM simulation techniques. ACM Comp. Surv. (CSUR), 26(2):187--206, 1994.
[26]
Intel, Inc. 64 and IA-32 Architectures Software Developer's Manual, 2015.
[27]
J. Kepner and J. Gilbert. Graph algorithms in the language of linear algebra, volume 22. SIAM, 2011.
[28]
J. Kim et al. Technology-Driven, Highly-Scalable Dragonfly Topology. In Ann. Intl. Symp. on Comp. Arch., ISCA '08, pages 77--88, 2008.
[29]
M. Kulkarni et al. Optimistic parallelism requires abstractions. In ACM SIGPLAN Conf. on Prog. Lang. Des. and Impl., PLDI '07, pages 211--222, 2007.
[30]
C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proc. of ACM Symp. on Par. in Alg. and Arch., pages 303--314, 2010.
[31]
J. Leskovec et al. Kronecker graphs: An approach to modeling networks. J. of Machine Learning Research, 11(Feb):985--1042, 2010.
[32]
Y. Low et al. Graphlab: A new framework for parallel machine learning. preprint arXiv:1006.4990, 2010.
[33]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in Parallel Graph Processing. Par. Proc. Let., 17(1):5--20, 2007.
[34]
K. Madduri et al. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In Par. & Dist. Proc. (IPDPS), IEEE Intl. Symp. on, pages 1--8, 2009.
[35]
G. Malewicz et al. Pregel: a system for large-scale graph processing. In Proc. of the ACM SIGMOD Intl. Conf. on Manag. of Data, SIGMOD '10, pages 135--146, 2010.
[36]
T. Mattson et al. Standards for graph algorithm primitives. arXiv preprint arXiv:1408.0393, 2014.
[37]
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, 2003.
[38]
Michael Voss (Intel). Understanding the Internals of tbb::graph : Balancing Push and Pull.
[39]
MPI Forum.textsfMPI: A Message-Passing Interface Standard. Version 3, 2012.
[40]
R. C. Murphy et al. Introducing the graph 500. Cray User's Group (CUG), 2010.
[41]
V. Prabhakaran et al. Managing large graphs on multi-cores with graph awareness. In USENIX Annual Technical Conference, volume 12, 2012.
[42]
D. Prountzos and K. Pingali. Betweenness centrality: algorithms and implementations. In ACM SIGPLAN Notices, volume 48, pages 35--46. ACM, 2013.
[43]
S. Salihoglu and J. Widom. Optimizing graph algorithms on Pregel-like systems. Proceedings of the VLDB Endowment, 7(7):577--588, 2014.
[44]
N. Satish et al. Navigating the maze of graph analytics frameworks using massive graph datasets. In ACM SIGMOD Intl. Conf. on Man. of Data, pages 979--990, 2014.
[45]
T. Schank. Algorithmic aspects of triangle-based network analysis. PhD thesis, University Karlsruhe, 2007.
[46]
S. Seo et al. HAMA: An Efficient Matrix Computation with the MapReduce Framework. In Intl. Conf. on Cloud Comp. Tech. and Science, CLOUDCOM'10, pages 721--726, 2010.
[47]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In ACM SIGPLAN Notices, volume 48, pages 135--146, 2013.
[48]
J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In 2015 IEEE 31st Intl. Conf. on Data Engineering, pages 149--160, April 2015.
[49]
T. Suzumura et al. Performance characteristics of Graph500 on large-scale distributed environment. In Workload Char. (IISWC), IEEE Intl. Symp. on, pages 149--158, 2011.
[50]
V. N. Swamy et al. An Asymptotically Optimal Push--Pull Method for Multicasting Over a Random Network. Inf. Theory, IEEE Tran. on, 59(8):5075--5087, 2013.
[51]
Z. Wang et al. Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing. In ACM Intl. Conf. on Man. of Data, pages 479--494, 2016.
[52]
J. J. Whang et al. Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned. In Euro-Par: Par. Proc., pages 438--450. 2015.
[53]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181--213, 2015.
[54]
M. Zaharia et al. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proc. of the USENIX Conf. on Net. Sys. Design and Impl., NSDI'12, pages 2--2, 2012.
[55]
M. Zhang et al. Exploring the hidden dimension in graph processing. In USENIX Symp. on Op. Sys. Des. and Impl. (OSDI 16), 2016.
[56]
Y. Zhao. A model of computation with push and pull processing. PhD thesis, Citeseer, 2003.
[57]
X. Zhu et al. Gemini: A computation-centric distributed graph processing system. In USENIX Symp. on Op. Sys. Des. and Impl. (OSDI 16), 2016.

Cited By

View all
  • (2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '17: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
June 2017
254 pages
ISBN:9781450346993
DOI:10.1145/3078597
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. graph computations

Qualifiers

  • Research-article

Conference

HPDC '17
Sponsor:

Acceptance Rates

HPDC '17 Paper Acceptance Rate 19 of 100 submissions, 19%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)13
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A high-performance design, implementation, deployment, and evaluation of the slim fly networkProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691882(1025-1044)Online publication date: 16-Apr-2024
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Hypergraph-based locality-enhancing methods for graph operations in Big Data applicationsInternational Journal of High Performance Computing Applications10.1177/1094342023121453238:3(210-224)Online publication date: 1-May-2024
  • (2024)PIM-Potential: Broadening the Acceleration Reach of PIM ArchitecturesProceedings of the International Symposium on Memory Systems10.1145/3695794.3695795(1-12)Online publication date: 30-Sep-2024
  • (2024)Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common BugsACM Transactions on Parallel Computing10.1145/366525111:3(1-29)Online publication date: 15-May-2024
  • (2024)TLPGNN: A Lightweight Two-level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUsACM Transactions on Parallel Computing10.1145/364471211:2(1-28)Online publication date: 8-Jun-2024
  • (2024)High Performance Unstructured SpMM Computation Using Tensor CoresSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00060(1-14)Online publication date: 17-Nov-2024
  • (2024)Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00056(671-685)Online publication date: 2-Nov-2024
  • (2024)Algorithms for Fast Spiking Neural Network Simulation on FPGAsIEEE Access10.1109/ACCESS.2024.347993312(150334-150353)Online publication date: 2024
  • (2024)StarPlatJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104967194:COnline publication date: 1-Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media