Abstract
The MapReduce programming model is widely used to parallelize data processing over the large scale of commodity computer clusters. However, on account of its monotonous data representation, it fails to express graph-parallel algorithms naturally and execute them efficiently. Alternatively, Pregel and PowerGraph could address these challenges. But they require users to familiarize another set of programming patterns and platforms, and at the same time the legacy MapReduce code also becomes incompatible and useless. In this paper, we proposed the Graph-compatible MapReduce (GMR) as an extension of Google’s Standard MapReduce (SMR). In this way, graph-parallel algorithm will be naturally expressed without compromising the efficiency and simplicity, and meanwhile the conventional MapReduce programming pattern be preserved. Also, users could gain the convenience of “Think like a vertex”. Based on the experimental studying, we analyzed the ratio of the redundant computation, transmission and data caching introduced in naive iterative MapReduce platforms (e.g., HaLoop, Twister). Furthermore, we discussed the difference between GMR and the graph-targeted frameworks. The evaluation experiment results show that GMR outperforms GraphX in a series of real-world graph-parallel algorithms.













Similar content being viewed by others
Change history
17 October 2017
In the original publication, Fig. 12 was incorrectly presented. The plot line and legends of Fig. 12a, c, e and f should not overlap. The original article was corrected.
References
Beierlein F, Clark T (2005) Computer simulations of enzyme reaction mechanisms: simulation of protein spectra. High performance computing in science & engineering Munich 2004, Springer, pp 245-259
Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: efficient iterative data processing on large clusters. Proceedings of the Vldb endowment 3(1):285–296
Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: SPAA ’09: proceedings of the twenty-first annual symposium on parallelism in algorithms and archi, pp 233–244
Cherkassky BV, Goldberg AV, Radzik T (1996) Shortest path algorithms: theory and experimental evaluation. Math Program 73(2):129–174
Chua TS, Chua TS, Chua TS, Chua TS, Chua TS (2016) Learning from collective intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1):1
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In: ACM international symposium on high performance distributed computing, pp 810–818
Elgohary A, (2012) Stateful mapreduce
Gao Z, Zhang H, Xu GP, Xue YB, Hauptmann AG (2014) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112(C):83–97
Gao Z, Zhang H, Xu GP, Xue YB (2015) Multi-perspective and multi-modality joint representation and recognition model for 3d action recognition. Neurocomputing 151:554–564
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: distributed graph-parallel computation on natural graphs. In: Usenix conference on operating systems design and implementation, pp 17–30
Guattery S, Miller GL (1995) On the performance of spectral graph partitioning methods. In: ACM-SIAM symposium on discrete algorithms, pp 233–242
Karypis G, Kumar V (1998) Metis: a software package for partitioning unstructured graphs. In: International cryogenics monograph, pp 121–124
Karypis G, Kumar V (1999) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu AA, Su YT, Nie WZ, Kankanhalli M (2016) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Lv Q, Josephson W, Wang Z, Charikar M, Li K (2007) Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: International conference on very large data bases, University of Vienna, Austria, September, pp 950–961
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2009) Pregel: a system for large-scale graph processing. In: SPAA 2009: proceedings of the ACM symposium on parallelism in algorithms and architectures, Calgary, Alberta, Canada, August, pp 135–146
Miller F (1993) A library for bulk-synchronous parallel programming. In: Proceedings of the BCS parallel processing specialist group workshop on general purpose parallel computing, pp 100–108
Nie W, Liu A, Li W, Su Y (2016) Cross-view action recognition by cross-domain learning *. Image Vis Comput 55:109–118
Nie WZ, Liu AA, Gao Z, Su YT (2015) Clique-graph matching by preserving global & local structure. In: Computer vision and pattern recognition, pp 4503–4510
Nie WZ, Liu AA, Su YT (2016) 3D object retrieval based on sparse coding in weak supervision. J Vis Commun Image Represent 37(C):40–45
Raji RP (2009) Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113
Savage JE, Wloka MG (1991) Parallelism in graph-partitioning. J Parallel Distrib Comput 13(3):257–272
Weilenmann M (2012) Aspects of highly transient catalyst simulation. Catal Today 188(1):121–134
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. In: International workshop on graph data management experiences and systems, pp 1–6
Yang J, Leskovec J (2015) Defining and evaluating network communities based on ground-truth. Knowl Inf Syst 42(1):745–754
Zhang H, Liu W, Liu W, He X, Luan H, Chua TS (2016) Discrete collaborative filtering. In: International ACM SIGIR conference on research and development in information retrieval, pp 325–334
Zhang H, Zha ZJ, Yang Y, Yan S, Chua TS (2014) Robust (semi) nonnegative graph embedding. IEEE Trans Image Process A Publ IEEE Signal Process Soc 23(7):2996
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 33–42
Zhang Y, Gao Q, Gao L, Wang C (2012) Imapreduce: a distributed computing framework for iterative computation. J Grid Comput 10(1):1112–1121
Acknowledgments
Our thanks to the Institute of Process Engineering, Chinese Academy of Science for their help. This research was supported by the Zhejiang Engineering Research Center of Intelligent Medicine(2016E10011) and the research and application of key technologies for rapid individualized sculpture manufacture and carving stone materials appraisal.
Author information
Authors and Affiliations
Corresponding author
Additional information
The original version of this article was revised: The plot line and legends of Fig. 12a, c, e and f should not overlap.
Rights and permissions
About this article
Cite this article
Zhang, W., He, B., Chen, Y. et al. GMR: graph-compatible MapReduce programming model. Multimed Tools Appl 78, 457–475 (2019). https://doi.org/10.1007/s11042-017-5102-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5102-2