Abstract
Parallel computation is often a must when processing large-scale graphs. However, it is nontrivial to write parallel graph algorithms with correctness guarantees. This paper presents the programming model of \(\mathsf {GRAPE}\), a parallel GRAPh Engine [19]. \(\mathsf {GRAPE}\) allows users to “plug in” sequential (single-machine) graph algorithms as a whole, and it parallelizes the algorithms across a cluster of processors. In other words, it simplifies parallel programming for graph computations, from think parallel to think sequential. Under a monotonic condition, it guarantees to converge at correct answers as long as the sequential algorithms are correct. We present the foundation underlying \(\mathsf {GRAPE}\), based on simultaneous fixpoint computation. As examples, we demonstrate how \(\mathsf {GRAPE}\) parallelizes our familiar sequential graph algorithms. Furthermore, we show that in addition to its programming simplicity, \(\mathsf {GRAPE}\) achieves performance comparable to the state-of-the-art graph systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
\({\mathsf {GraphLab}} _{{\mathsf {sync}}}\) and \({\mathsf {GraphLab}} _{{\mathsf {async}}}\) run different modes of \(\mathsf {GraphLab}\) (PowerGraph).
References
DBpedia. http://wiki.dbpedia.org/Datasets
Friendster. https://snap.stanford.edu/data/com-Friendster.html
Giraph. http://giraph.apache.org/
Traffic. http://www.dis.uniroma1.it/challenge9/download.shtml
UKWeb. http://law.di.unimi.it/webdata/uk-union-2006-06-2007-05/, 2006
Acar, U.A.: Self-adjusting computation. Ph.D thesis, CMU (2005)
Andreev, K., Racke, H.: Balanced graph partitioning. Theory Comput. Syst. 39(6), 929–939 (2006)
Bader, D.A., Cong, G.: Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. J. Parallel Distrib. Comput. 66(11), 1366–1378 (2006)
Bang-Jensen, J., Gutin, G.Z.: Digraphs: Theory, Algorithms and Applications. Springer, Berlin (2008)
Baudet, G.M.: Asynchronous iterative methods for multiprocessors. J. ACM 25(2), 226–244 (1978)
Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1), 107–120 (1983)
Chazan, D., Miranker, W.: Chaotic relaxation. Linear Algebr. Appl. 2(2), 199–222 (1969)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51(1) (2008)
Fan, W., Hu, C., Tian, C.: Incremental graph computations: doable and undoable. In: SIGMOD (2017)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractability to polynomial time. In: PVLDB (2010)
Fan, W., Wang, X., Wu, Y.: Incremental graph pattern matching. TODS 38(3) (2013)
Fan, W., Wang, X., Wu, Y., Xu, J.: Association rules with graph patterns. PVLDB 8(12), 1502–1513 (2015)
Fan, W., Xu, J., Wu, Y., Yu, W., Jiang, J.: GRAPE: parallelizing sequential graph computations. PVLDB 10(12), 1889–1892 (2017)
Fan, W., et al.: Parallelizing sequential graph computations. In: SIGMOD (2017)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. JACM 34(3), 596–615 (1987)
Gallager, R.G., Humblet, P.A., Spira, P.M.: A distributed algorithm for minimum-weight spanning trees. TOPLAS 5(1), 66–77 (1983)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: USENIX (2012)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: OSDI (2014)
Grujic, I., Bogdanovic-Dinic, S., Stoimenov, L.: Collecting and analyzing data from E-Government Facebook pages. In: ICT Innovations (2014)
Han, M., Daudjee, K.: Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB 8(9), 950–961 (2015)
Han, M., Daudjee, K., Ammar, K., Ozsu, M.T., Wang, X., Jin, T.: An experimental comparison of Pregel-like graph processing systems. VLDB 7(12) (2014)
Henzinger, M.R., Henzinger, T., Kopke, P.: Computing simulations on finite and infinite graphs. In: FOCS (1995)
Ho, Q., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: NIPS, pp. 1223–1231 (2013)
Jones, N.D.: An introduction to partial evaluation. ACM Comput. Surv. 28(3) (1996)
Kim, M., Candan, K.S.: SBV-Cut: vertex-cut based graph partitioning using structural balance vertices. Data Knowl. Eng. 72, 285–303 (2012)
Li, M., et al.: Parameter server for distributed machine learning. In: NIPS Workshop on Big Learning (2013)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8) (2012)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)
McSherry, F., Isard, M., Murray, D.G.: Scalability! but at what cost? In: HotOS (2015)
Nesetril, J., Milková, E., Nesetrilová, H.: Otakar boruvka on minimum spanning tree problem. Discret. Math. 233(1–3), 3–36 (2001)
Pingali, K., et al.: The tao of parallelism in algorithms. In: PLDI (2011)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6) (1957)
Radoi, C., Fink, S.J., Rabbah, R.M., Sridharan, M.: Translating imperative code to mapreduce. In: OOPSLA (2014)
Ramalingam, G., Reps, T.: An incremental algorithm for a generalization of the shortest-path problem. J. Algorithms 21(2), 267–305 (1996)
Ramalingam, G., Reps, T.: On the computational complexity of dynamic graph problems. TCS 158(1–2) (1996)
Raychev, V., Musuvathi, M., Mytkowicz, T.: Parallelizing user-defined aggregations using symbolic execution. In: SOSP (2015)
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD (2013)
Slota, G.M., Rajamanickam, S., Devine, K., Madduri, K.: Partitioning trillion-edge graphs in minutes. In: IPDPS (2017)
Tian, Y., Balmin, A., Corsten, S.A., Shirish Tatikonda, J.M.: From “think like a vertex” to “think like a graph”. PVLDB 7(7), 193–204 (2013)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Valiant, L.G.: General purpose parallel architectures. Handbook of Theoretical Computer Science, vol. A (1990)
Wang, G., Xie, W., Demers, A.J., Gehrke, J.: Asynchronous large-scale graph processing made easy. In: CIDR (2013)
Xie, C., Yan, L., Li, W.-J., Zhang, Z.: Distributed power-law graph computing: theoretical and empirical analysis. In: NIPS (2014)
Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Petuum, YYu.: A new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)
Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. PVLDB 7(14), 1981–1992 (2014)
Zhou, Y., Liu, L., Lee, K., Pu, C., Zhang, Q.: Fast iterative graph computation with resource aware graph parallel abstractions. In: HPDC (2015)
Acknowledgments
The paper is a tribute to Professor Chaochen Zhou, who took Fan as an MSc student 30 years ago, despite pressure from a powerful person, whom Fan confronted to get justice done for his late former MSc adviser. The authors are supported in part by 973 Program 2014CB340302, ERC 652976, EPSRC EP/M025268/1, NSFC 61421003, Beijing Advanced Innovation Center for Big Data and Brain Computing, Shenzhen Peacock Program 1105100030834361, and Joint Research Lab between Edinburgh and Huawei.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fan, W., Liu, M., Xu, R., Hou, L., Li, D., Meng, Z. (2018). Think Sequential, Run Parallel. In: Jones, C., Wang, J., Zhan, N. (eds) Symposium on Real-Time and Hybrid Systems. Lecture Notes in Computer Science(), vol 11180. Springer, Cham. https://doi.org/10.1007/978-3-030-01461-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-01461-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01460-5
Online ISBN: 978-3-030-01461-2
eBook Packages: Computer ScienceComputer Science (R0)