Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Authors:
Laxman Dhulipala

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Guy Blelloch

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Julian Shun

University of California Berkeley, Berkeley, CA, USA

University of California Berkeley, Berkeley, CA, USA
View Profile

SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and ArchitecturesJuly 2017Pages 293–304https://doi.org/10.1145/3087556.3087580

Published:24 July 2017Publication History

SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

Pages 293–304

ABSTRACT

Existing graph-processing frameworks let users develop efficient implementations for many graph problems, but none of them support efficiently bucketing vertices, which is needed for bucketing-based graph algorithms such as \Delta-stepping and approximate set-cover. Motivated by the lack of simple, scalable, and efficient implementations of bucketing-based algorithms, we develop the Julienne framework, which extends a recent shared-memory graph processing framework called Ligra with an interface for maintaining a collection of buckets under vertex insertions and bucket deletions.

We provide a theoretically efficient parallel implementation of our bucketing interface and study several bucketing-based algorithms that make use of it (either bucketing by remaining degree or by distance) to improve performance: the peeling algorithm for k-core (coreness), \Delta-stepping, weighted breadth-first search, and approximate set cover. The implementations are all simple and concise (under 100 lines of code). Using our interface, we develop the first work-efficient parallel algorithm for k-core in the literature with nontrivial parallelism.

We experimentally show that our bucketing implementation scales well and achieves high throughput on both synthetic and real-world workloads. Furthermore, the bucketing-based algorithms written in Julienne achieve up to 43x speedup on 72 cores with hyper-threading over well-tuned sequential baselines, significantly outperform existing work-inefficient implementations in Ligra, and either outperform or are competitive with existing special-purpose parallel codes for the same problem. We experimentally study our implementations on the largest publicly available graphs and show that they scale well in practice, processing real-world graphs with billions of edges in seconds, and hundreds of billions of edges in a few minutes. As far as we know, this is the first time that graphs at this scale have been analyzed in the main memory of a single multicore machine.

References

D. Achlioptas and M. Molloy. The solution space geometry of random linear equations. Random Structures & Algorithms, 46(2), 2015. Google ScholarDigital Library
J. I. Alvarez-Hamelin, L. Dall'asta, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems. 2005.Google Scholar
R. Anderson and E. W. Mayr. A P-complete problem and approximations to it. Technical report, 1984.Google Scholar
V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. CoRR, cs.DS/0310049, 2003.Google Scholar
S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In International Conference on High Performance Computing, Networking, Storage and Analysis, 2012. Google ScholarDigital Library
S. Beamer, K. Asanovic, and D. A. Patterson. The GAP benchmark suite. CoRR, abs/1508.03619, 2015.Google Scholar
B. Berger, J. Rompel, and P. W. Shor. Efficient NC algorithms for set cover with applications to learning and geometry. J. Comput. Syst. Sci., 49(3), Dec. 1994. Google ScholarDigital Library
G. E. Blelloch, Y. Gu, Y. Sun, and K. Tangwongsan. Parallel shortest paths using radius stepping. In ACM Symposium on Parallelism in Algorithms and Architectures, 2016. Google ScholarDigital Library
G. E. Blelloch, R. Peng, and K. Tangwongsan. Linear-work greedy parallel approximate set cover and variants. In ACM Symposium on Parallelism in Algorithms and Architectures, 2011. Google ScholarDigital Library
G. E. Blelloch, H. V. Simhadri, and K. Tangwongsan. Parallel and I/O efficient set covering algorithms. In ACM Symposium on Parallelism in Algorithms and Architectures, 2012. Google ScholarDigital Library
G. S. Brodal, J. L. Tr\"aff, and C. D. Zaroliagis. A parallel priority queue with constant time operations. J. Parallel Distrib. Comput., 49(1), Feb. 1998. Google ScholarDigital Library
F. Chierichetti, R. Kumar, and A. Tomkins. Max-cover in map-reduce. In International Conference on World Wide Web, 2010. Google ScholarDigital Library
E. Cohen. Using selective path-doubling for parallel shortest-path computations. J. Algorithms, 22(1), Jan. 1997. Google ScholarDigital Library
R. Cole, P. N. Klein, and R. E. Tarjan. Finding minimum spanning forests in logarithmic time and linear work using random sampling. In ACM Symposium on Parallel Algorithms and Architectures. ACM, 1996. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3. ed.). MIT Press, 2009.Google ScholarDigital Library
N. S. Dasari, R. Desh, and M. Zubair. ParK: An efficient algorithm for k-core decomposition on multicore processors. In IEEE International Conference on Big Data, 2014. Google ScholarCross Ref
A. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single-source shortest paths. In IEEE International Parallel and Distributed Processing, 2014. Google ScholarDigital Library
R. B. Dial. Algorithm 360: Shortest-path forest with topological ordering [H]. Commun. ACM, 12(11), Nov. 1969. Google ScholarDigital Library
E. W. Dijkstra. A note on two problems in connexion with graphs. Numer. Math., 1(1), Dec. 1959. Google ScholarDigital Library
B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In IEEE International Conference on Big Data, 2013. Google ScholarCross Ref
M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, 34(3), July 1987. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation, 2012.Google Scholar
Y. Gu, J. Shun, Y. Sun, and G. E. Blelloch. A top-down parallel semisort. In ACM Symposium on Parallelism in Algorithms and Architectures, 2015. Google ScholarDigital Library
M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: A comparison of parallelism and work-efficiency in irregular algorithms. In ACM Symposium on Principles and Practice of Parallel Programming, 2011. Google ScholarDigital Library
J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992.Google ScholarDigital Library
J. Jiang, M. Mitzenmacher, and J. Thaler. Parallel peeling algorithms. ACM Trans. Parallel Comput., 3(1), Jan. 2017.Google ScholarDigital Library
D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9(3), 1974. Google ScholarDigital Library
W. Khaouid, M. Barsky, V. Srinivasan, and A. Thomo. k-core decomposition of large networks on a single PC. Proc. VLDB Endow., 9(1), Sept. 2015. Google ScholarDigital Library
P. N. Klein and S. Subramanian. A randomized parallel algorithm for single-source shortest paths. J. Algorithms, 25(2), Nov. 1997. Google ScholarDigital Library
R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani. Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput., 2(3), Sept. 2015.Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In International Conference on World Wide Web, 2010. Google ScholarDigital Library
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8), Apr. 2012. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence, July 2010.Google Scholar
K. Madduri, D. A. Bader, J. W. Berry, and J. R. Crobak. An experimental study of a parallel shortest path algorithm for solving large-scale graph instances. In Meeting on Algorithm Engineering & Experiments, 2007. Google ScholarCross Ref
S. Maleki, D. Nguyen, A. Lenharth, M. Garzarán, D. Padua, and K. Pingali. DSMR: A parallel algorithm for single-source shortest path problem. In International Conference on Supercomputing, 2016.Google ScholarDigital Library
S. Maleki, D. Nguyen, A. Lenharth, M. Garzarán, D. Padua, and K. Pingali. DSMR: A parallel algorithm for single-source shortest path problem. In International Conference on Supercomputing, 2016. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data, 2010. Google ScholarDigital Library
D. W. Matula and L. L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM, 30(3), July 1983. Google ScholarDigital Library
F. McSherry, M. Isard, and D. G. Murray. Scalability! But at what COST? In Workshop on Hot Topics in Operating Systems, 2015.Google Scholar
R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, 1(1), 2015. Google ScholarCross Ref
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1), 2003. Google ScholarDigital Library
A. Montresor, F. D. Pellegrini, and D. Miorandi. Distributed k-core decomposition. IEEE Transactions on Parallel and Distributed Systems, 24(2), 2013. Google ScholarDigital Library
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In ACM Symposium on Operating Systems Principles, 2013. Google ScholarDigital Library
R. C. Paige and C. P. Kruskal. Parallel algorithms for shortest path problems. In International Conference on Parallel Processing, 1985.Google Scholar
K. Pechlivanidou, D. Katsaros, and L. Tassiulas. MapReduce-based distributed k-shell decomposition for online social networks. In IEEE World Congress on Services, 2014. Google ScholarDigital Library
S. Rajagopalan and V. V. Vazirani. Primal-dual RNC approximation algorithms for set cover and covering integer programs. SIAM J. Comput., 28(2), Feb. 1999.Google ScholarDigital Library
A. E. Sariyüce and A. Pinar. Fast hierarchy construction for dense subgraphs. Proc. VLDB Endow., 10(3), Nov. 2016. Google ScholarDigital Library
A. E. Sariyuce, C. Seshadhri, and A. Pinar. Parallel local algorithms for core, truss, and nucleus decompositions. arXiv preprint arXiv:1704.00386, 2017.Google Scholar
S. B. Seidman. Network structure and minimum degree. Social Networks, 5(3), 1983. Google ScholarCross Ref
H. Shi and T. H. Spencer. Time-work tradeoffs of the single-source shortest paths problem. J. Algorithms, 30(1), Jan. 1999. Google ScholarDigital Library
K. Shin, T. Eliassi-Rad, and C. Faloutsos. CoreScope: Graph mining using k-core analysis--patterns, anomalies and algorithms. In IEEE International Conference on Data Mining, 2016. Google ScholarCross Ref
J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In ACM SIGPLAN Symposium On Principles and Practice of Parallel Programming, 2013. Google ScholarDigital Library
J. Shun, G. E. Blelloch, J. T. Fineman, and P. B. Gibbons. Reducing contention through priority updates. In ACM Symposium on Parallelism in Algorithms and Architectures, 2013.Google Scholar
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the problem based benchmark suite. In ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 2012. Google ScholarDigital Library
J. Shun, L. Dhulipala, and G. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In ACM Symposium on Parallelism in Algorithms and Architectures, 2014. Google ScholarDigital Library
J. Shun, L. Dhulipala, and G. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra+ In IEEE Data Compression Conference, 2015.Google Scholar
T. H. Spencer. Time-work tradeoffs for parallel algorithms. J. ACM, 44(5), Sept. 1997. Google ScholarDigital Library
S. Stergiou and K. Tsioutsiouliklis. Set cover at web scale. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015. Google ScholarDigital Library
J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011.Google Scholar
Y. Wang, A. A. Davidson, Y. Pan, Y. Wu, A. Riffel, and J. D. Owens. Gunrock: a high-performance graph processing library on the GPU. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016. Google ScholarDigital Library
S. Wuchty and E. Almaas. Peeling the yeast protein network. Proteomics, 5(2), 2005. Google ScholarCross Ref

Index Terms

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
      1. Shared memory algorithms

Recommendations

Ligra: a lightweight graph processing framework for shared memory
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to ...
Read More
Ligra: a lightweight graph processing framework for shared memory
PPoPP '13

There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to ...
Read More
L(2,1)-labeling of dually chordal graphs and strongly orderable graphs

An L(2,1)-labeling of a graph G=(V,E) is a function f:V(G)->{0,1,2,...} such that |f(u)-f(v)|>=2 whenever uv@__ __E(G) and |f(u)-f(v)|>=1 whenever u and v are at distance two apart. The span of an L(2,1)-labeling f of G, denoted as SP"2(f,G), is the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures
July 2017
392 pages
ISBN:9781450345934
DOI:10.1145/3087556
General Chair:
Christian Scheideler
Paderborn University, Germany
,
Program Chair:
Mohammad Hajiaghayi
University of Maryland at College Park, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
experiments
graph algorithms
parallel programming
shared memory
Qualifiers
- research-article
Conference

Acceptance Rates
SPAA '17 Paper Acceptance Rate31of127submissions,24%Overall Acceptance Rate447of1,461submissions,31%
More
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 950
  Total Downloads
- Downloads (Last 12 months)216
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.