ABSTRACT
Bisimulation summaries of graph data have multiple applications, including facilitating graph exploration and enabling query optimization techniques, but efficient, scalable, summary construction is challenging. The literature describes parallel construction algorithms using message-passing, and these have been recently adapted to MapReduce environments. The fixpoint nature of bisimulation is well suited to iterative graph processing, but the existing MapReduce solutions do not drastically decrease per-iteration times as the computation progresses.
In this paper, we focus on leveraging parallel multi-core graph frameworks with the goal of constructing summaries in roughly the same amount of time that it takes to input the data into the framework (for a range of real world data graphs) and output the summary. To achieve our goal we introduce a singleton optimization that significantly reduces per-iteration times after only a few iterations. We present experimental results validating that our scalable GraphChi implementation achieves our goal with bisimulation summaries of million to billion edge graphs.
- Big data: The next frontier for innovation, competition, and productivity. McKinsey & Company, May 2011.Google Scholar
- R. Appuswamy, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron. Scale-up vs Scale-out for Hadoop: Time to rethink? SOCC, 2013. Google ScholarDigital Library
- S. Blom and S. Orzan. A distributed algorithm for strong bisimulation reduction of state spaces. Electr. Notes Theor. Comput. Sci., 68(4):523--538, 2002.Google ScholarCross Ref
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Comput. Netw., 33(1-6):309--320, June 2000. Google ScholarDigital Library
- M. P. Consens, V. Fionda, S. Khatchadourian, and G. Pirrò. Rewriting Queries over Summaries of Big Data Graphs. AMW, 2014.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- A. Dovier, C. Piazza, and A. Policriti. An efficient algorithm for computing bisimulation equivalence. Theor. Comput. Sci., 311(1-3):221--256, 2004. Google ScholarDigital Library
- R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In VLDB, pages 436--445, 1997. Google ScholarDigital Library
- O. Hassanzadeh and M. P. Consens. Linked movie data base. In LDOW, 2009.Google Scholar
- M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Foundations of Computer Science, pages 453--462, 1995. Google ScholarDigital Library
- P. C. Kanellakis and S. A. Smolka. CCS expressions, finite state processes, and three problems of equivalence. Information and Computation, 86(1):43--68, 1990. Google ScholarDigital Library
- S. Khatchadourian and M. P. Consens. ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud. In ESWC, pages 272--287, 2010. Google ScholarDigital Library
- A. Kumar, J. Gluck, A. Deshpande, and J. Lin. Hone: "Scaling Down" Hadoop on Shared-Memory Systems. PVLDB, 6(12):1354--1357, 2013. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, pages 31--46, 2012. Google ScholarDigital Library
- Y. Luo, G. H. L. Fletcher, J. Hidders, P. De Bra, and Y. Wu. Regularities and dynamics in bisimulation reductions of big graphs. GRADES, 2013. Google ScholarDigital Library
- Y. Luo, Y. Lange, G. H. Fletcher, P. Bra, J. Hidders, and Y. Wu. Bisimulation Reduction of Big Graphs on MapReduce. In BNCOD, volume 7968, pages 189--203. 2013. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM Conference on Internet Measurement, pages 29--42, 2007. Google ScholarDigital Library
- R. Paige and R. E. Tarjan. Three partition refinement algorithms. SIAM Journal on Computing, 16(6):973--989, 1987. Google ScholarDigital Library
- M. Rudolf, M. Paradies, C. Bornhövd, and W. Lehner. Synopsys: large graph analytics in the SAP HANA database through summarization. In GRADES, 2013. Google ScholarDigital Library
- D. Sangiorgi. On the origins of bisimulation and coinduction. Trans. Program. Lang. Syst., 31(4), 2009. Google ScholarDigital Library
- A. Schätzle, A. Neu, G. Lausen, and M. Przyjaciel-Zablocki. Large-scale bisimulation of RDF graphs. SWIM, 2013. Google ScholarDigital Library
- L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarDigital Library
Index Terms
- Constructing Bisimulation Summaries on a Multi-Core Graph Processing Framework
Recommendations
Big Graph Processing Systems: State-of-the-Art and Open Challenges
BIGDATASERVICE '15: Proceedings of the 2015 IEEE First International Conference on Big Data Computing Service and ApplicationsGraph is a fundamental data structure that captures relationships between different data entities. In practice, graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, ...
Constructing connected bicritical graphs with edge-connectivity 2
A graph G is said to be bicritical if the removal of any pair of vertices decreases the domination number of G . For a bicritical graph G with the domination number t , we say that G is t -bicritical. Let λ ( G ) denote the edge-connectivity of G . In 2]...
Comments