Abstract
Graphs can be used to model many kinds of data, from traditional datasets to social networks or semi-structured datasets. To process large graphs, many systems have been proposed. The Pregel programming model is popular, thanks to its scalability. Although Pregel is simple to understand and use, it is of low-level in programming and requires developers to write programs that are hard to maintain and need to be carefully optimized. On the other hand, structural recursion is powerful to systematically construct efficient parallel programs on lists, arrays and trees, but it has not yet been applied to graphs. In this paper, we propose an efficient method for parallel evaluation of structural recursion on graphs, which is suitable for Pregel. We design and implement a high-level parallel programming framework where a domain-specific language (DSL) is provided to ease the programing task. Specifications written in the DSL are automatically compiled into Pregel programs that are scalable for large graphs. Experimental results show that our framework outperforms the original evaluation of structural recursion, and achieves good scalability and speedup for real datasets.
Similar content being viewed by others
Notes
http://arnetminer.org/billboard/citation, dataset: citation-network V1.
http://netsg.cs.sfu.ca/youtubedata/, dataset: 0222, Feb. 22nd, 2007.
References
Afrati, F.N., Ullman, J.D.: Transitive closure and recursive datalog implemented on clusters. In: Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12 (2012)
Buneman, P.: Semistructured data. In: Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS ’97, pp. 117–121. ACM, New York, NY, USA (1997)
Buneman, P., Fernandez, M., Suciu, D.: UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion. VLDB J. 9(1), 76–110 (2000)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Emoto, K., Fischer, S., Hu, Z.: Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce. In: Proceedings of the 21st European Conference on Programming Languages and Systems, ESOP’12, pp. 254–273. Springer, Berlin (2012)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI’12, pp. 17–30 (2012)
Hidaka, S., Hu, Z., Kato, H., Nakano, K.: Towards a compositional approach to model transformation for software development. In: Proceedings of the 2009 ACM Symposium on Applied Computing, SAC ’09, pp. 468–475. ACM, New York, NY, USA (2009)
Hong, S., Salihoglu, S., Widom, J., Olukotun, K.: Simplifying scalable graph processing with a domain-specific language. In: Proceedings of CGO’14, pp. 208–218 (2014)
Krause, C., Tichy, M., Giese, H.: Implementing graph transformations in the bulk synchronous parallel model. In: Gnesi, S., Rensink, A. (eds.) Fundamental Approaches to SoftwareEngineering, Lecture Notes in Computer Science, vol. 8411, pp. 325–339. Springer, Berlin (2014)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10 (2010)
Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A library of constructive skeletons for sequential style of parallel programming. In: Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale ’06. ACM, New York, NY, USA (2006)
Nolé, M., Sartiani, C.: Processing regular path queries on giraph. In: EDBT/ICDT Workshops (2014)
Salihoglu, S., Widom, J.: HelP: High-level primitives for large-scale graph processing. In: Proceedings of Workshop on GRAph Data Management Experiences and Systems, GRADES’14, pp. 3:1–3:6 (2014)
Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27(1), 1–62 (2002)
Tung, L.D., Nguyen-Van, Q., Hu, Z.: Efficient query evaluation on distributed graphs with hadoop environment. In: Proceedings of the 4th Symposium on Information and Communication Technology, SoICT ’13. ACM, New York, NY, USA (2013)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 2:1–2:6 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tung, LD., Hu, Z. Towards Systematic Parallelization of Graph Transformations Over Pregel. Int J Parallel Prog 45, 320–339 (2017). https://doi.org/10.1007/s10766-016-0418-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0418-5